Open Jfortin1 opened 2 years ago
I also just run into that issue. Happens to me with many different short sequences of lengths 25-30 against hg19 reference, where I get less hits when running with 5 mismatches than with 4 mismatches (with less mismatches it behaves as expected).
Here is an example: @read1 GTGGGCGGGCAGGTGCAGGTGGGT + ########################
With 4 mismatches I get 501 hits in X1: (bwa aln -N -n4 hg19.fa read1.fastq): 19:7591367-7591594/16,24/+ 0 19 7591571 0 24M * 0 0 GTGGGCGGGCAGGTGCAGGTGGGT ######################## XT:A:U NM:i:0 X0:i:1 X1:i:501
With 5 mismatches I get only 39 hits in X1: (bwa aln -N -n4 hg19.fa read1.fastq): 19:7591367-7591594/16,24/+ 0 19 7591571 7 24M * 0 0 GTGGGCGGGCAGGTGCAGGTGGGT ######################## XT:A:U NM:i:0 X0:i:1 X1:i:39
bwa version: 0.7.17-r1188
-n 5 means that you will be able to map 95% of your reads successfully, assuming a 2% error rate. -n 6 means that you will be able to map 94% of your reads successfully, assuming a 2% error rate. That's why you get less hits. It is not # of mismatches
While aligning a 19nt-long sequence to the human reference genome (hg38) using
bwa aln
, I get the unexpected results to get less reported alignments when using 6 mismatches instead of 5.My toy example fastq file (toyExample.fastq):
My code for 6 mismatches:
I set the seed length to a large number to ignore seed-specific constraints.
Version: 0.7.17-r1198-dirty