lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.55k stars 556 forks source link

The reads rank in fq.gz influence the MAPPING results? #210

Open loganylchen opened 6 years ago

loganylchen commented 6 years ago

Hello ,there

I ran the bwa+samtools in

bwa mem -t 10 -k 32 -M -R '@RG\tID:test\tPL:Illumina\tLB:test\tSM:test' \
hg19.fasta test.1.fq.gz test.2.fq.gz   \
| samtools view -bSt hg19.fasta.fai - \
| samtools sort - -o test.sort.bam

this command.

bwa version:0.7.17-r1188 samtools version:1.8 (using htslib 1.8)

I ran same fq.gz except the read rank in fq.gz. I check the md5sum of all the fq.gz to make sure the fq.gz are the same except the rank.

the sort of fq.gz was use

seqkit sort -n test1.2.fq.gz -o test1.clean.sort.2.fq.gz

The md5 check were blew:

## no-sort test1.fq.gz
438bb422e07f37e156381d0d74e096b1  test1.clean.1.fq.gz
b8e60fa1338fa7e3d31cc3d1284c625f  test1.clean.2.fq.gz
## sort test1.fq.gz
c0d625838b39a775b1acf12471cda2eb  test1.clean.sort.1.fq.gz
4cc34f52013105c43acb77bc4fc6e16b  test1.clean.sort.2.fq.gz

## no-sort test2.fq.gz(test1 and test2 are same except rank and name)
cc1bf8d125100c05d95881fe86d9d61  test2.clean.1.fq.gz
6e2fcbd45d8947af97cfa5767c9825ae  test2.clean.2.fq.gz
## sort test2.fq.gz
c0d625838b39a775b1acf12471cda2eb  test2.clean.sort.1.fq.gz
4cc34f52013105c43acb77bc4fc6e16b  test2.clean.sort.2.fq.gz

And I found the MAPPING result have some different

in test1

ST-E00522:403:HL5K3CCXY:5:1101:2290:62452 97 1 55520052 24 36M114S = 55520052 36 TCTCAAAAATAAATAAGTAAATAACTAGCAGCTGTAGGCTGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAACCCCTCATCTCGTATGCCGTCTTCTGCTTGAAAAAACACGCTCCAGAGAAGGGGCAGCAGAACGACAGAGAA AAAFFJJJJJJJJJJJFJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJFJJJJJJJJJJJJJJJJJJA<F--------7-A)7---))--)-))------------ NM:i:0 MD:Z:36 MC:Z:108S36M6S AS:i:36 XS:i:0 ST-E00522:403:HL5K3CCXY:5:1101:2290:62452 145 1 55520052 24 108S36M6S = 55520052 -36 CCGTCTCACTGACAATCGAACCGGTTACATTGTTTTTTAATGATACGGCGACCACCGAGATCTACACCTAATCGAACACTCTTTCCCTACACGACGCTCTTCCGATCTTCTCAAAAATAAATAAGTAAATAACTAGCAGCTGTAGGCTGG ))----7-7-7----77))))-77-------7A7-JJJAJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA NM:i:0 MD:Z:36 MC:Z:36M114S AS:i:36 XS:i:0

but in test2

ST-E00522:403:HL5K3CCXY:5:1101:2290:62452 99 1 55520052 60 36M114S = 55520052 36 TCTCAAAAATAAATAAGTAAATAACTAGCAGCTGTAGGCTGGAGATCGGAAGAGCACACGTCTGAACTCCAGTCACAACCCCTCATCTCGTATGCCGTCTTCTGCTTGAAAAAACACGCTCCAGAGAAGGGGCAGCAGAACGACAGAGAA AAAFFJJJJJJJJJJJFJJJJJJJJJJJJAJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJFJFJJJJJJJJJJJJJJJJJJA<F--------7-A)7---))--)-))------------ NM:i:0 MD:Z:36 MC:Z:108S36M6S AS:i:36 XS:i:0 ST-E00522:403:HL5K3CCXY:5:1101:2290:62452 147 1 55520052 60 108S36M6S = 55520052 -36 CCGTCTCACTGACAATCGAACCGGTTACATTGTTTTTTAATGATACGGCGACCACCGAGATCTACACCTAATCGAACACTCTTTCCCTACACGACGCTCTTCCGATCTTCTCAAAAATAAATAAGTAAATAACTAGCAGCTGTAGGCTGG ))----7-7-7----77))))-77-------7A7-JJJAJJJJJJJFJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJJFFFAA NM:i:0 MD:Z:36 MC:Z:36M114S AS:i:36 XS:i:0

the difference were:

If I mapping the different location,I may understand,it is because the -M,but they all map the same location but different mapping quality and flag.

Can someone help me?