Open tseemann opened 4 years ago
In fact, bwt_index handles the issues. When there are N bases or other IUPAC codes in the reference, it just replaces them with random ACGT bases since the encoding scheme is optimized to use 2 bits for indexing DNA sequences.
That solves (2).
How do you handle (1) ?
If there are Ns in a read sequence, they are still aligned with the reference if the read is mapped. They are not considered as mismatches when evaluating alignment quality, however, the they will affect the alignment score. If they occurs at either ends with a block, they will be clipped from the read sequence. Do you have other suggestions to handle N bases that I missed in the MapCaller?
So they do not count for -maxmm
?
Do they count for -maxclip
?
Since Ns are normally appear at the ends of short reads, they will be identified when MapCaller checks the mapping quality at both ends and discards the sub-alignment including Ns. So, yes, they count for -maxclip.
N
bases in reads.R
How are these each handled in MapCaller?