lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.54k stars 557 forks source link

How are ambiguity charaters handled in BWA? #422

Open SaitouAsuka opened 5 months ago

SaitouAsuka commented 5 months ago

I have some reads with UMI(Unique Molecular Identifier). The reads contians same UMI are from the same sequence template. They will merged to a consensus reads which some bases are ambiguous. I use the consensus reads mapping to the reference genome, and here is mapping results: image As the seqence I highlighted with red rectangle, there is an ambiguity bases 'N' in the reads. I want to know why it is 'A' (base from reference genome ) but not 'N' ( base from reads) in 'MD' tag(0T85A5). Is there any official documentation on how aligner handles the ambiguity bases?

Thanks!