lh3 / bwa

Burrow-Wheeler Aligner for short-read alignment (see minimap2 for long-read alignment)
GNU General Public License v3.0
1.54k stars 557 forks source link

bwa mem different result in repetitive test #395

Closed baoyl818 closed 1 year ago

baoyl818 commented 1 year ago

Hi,developer

I use bwa mem to align one sample twice , but the .bam files are different in the two repetitive tests. Then I test for some other arguments as follow, but the output bam files are still different in two repetitive tests. The command line that I have tested : (1)bwa mem -t 20 -R "@RG\tID:tumor" -M ref.hg38 sample.R1.fq.gz sample.R2.fq.gz
(2)bwa mem -t 20 -a -R "@RG\tID:tumor" -M ref.hg38 sample.R1.fq.gz sample.R2.fq.gz (add -a ) (3)bwa mem -t 20 -K 10000000 -R "@RG\tID:tumor" -M ref.hg38 sample.R1.fq.gz sample.R2.fq.gz (add -K 10,000,000) (4)bwa mem -t 20 -K 1000000000000 -R "@RG\tID:tumor" -M ref.hg38 sample.R1.fq.gz sample.R2.fq.gz (add -K 1,000,000,000,000) (5)bwa mem -t 1 -R "@RG\tID:tumor" -M ref.hg38 sample.R1.fq.gz sample.R2.fq.gz (add -t 1 )

The different results are as follow: the same readID(V350125465L1C006R06700311073) is located in chr22 in test1, but is located in chr20 in test2: test1: V350125465L1C006R06700311073 83 chr22 11577976 0 27M2I121M = 11577834 -290 ACACCGTGCCCTGGCTGGCAGGATGGGGAGAGGAGGGAGCGTGTCTGTTCACCTGGCCAGCCCTAGGCAACTCTGCAGAGAAAGACACAGGCACTTCCCCTCTGCAGCCAAAGAGTTAAGAAGGCTCGATGTGAAATGAATCATTCCAGG DIDCIGEGFHHEGGHEGGGDGHCFHGHGDHCHHCGHEDDHGEGEHEGFDHCHHFCGHHDGHIGECGHGCDIEGEGICHCHCCCHCHCHDGGICIEEGHHHEHEGGDHHHCDDGCFFFDDHDDHEHEIFDFGDHDDDDHDCDHDFFHIDHH NM:i:7 MD:Z:5A19A30T33G31T25 MC:Z:150M AS:i:115 XS:i:125 RG:Z:tumor V350125465L1C006R06700311073 163 chr22 11577834 20 150M = 11577976 290 TGGTTATCTTTAGGTAGCAGAATTCAAGACTGCTTCTTTTTTCTTTTCTTCCTACTTGTATGTTATCTCTATTTCCCTGTGTGAGGATTTATGACTGTTGTGATGAAAAGGCTAGTATTCTAACTCCCTGCATCATAAGCACACACCGTG DHIDDFDIDDCFHHCFHHDHFFDDFEEHFHDHGCCFDCDDDDHDDDDIDDHHDFFCCHDFDGCDFCHCHDFCDDHHHDICHCIFHHEDCDFCICHDHDCGDHEDHEEFEHGHCFGDFDCFCFEHDDHHDHDEBHECEEHGFFFGEGHHCI NM:i:3 MD:Z:32A14T99A2 MC:Z:27M2I121M AS:i:137 XS:i:127 RG:Z:tumor

test2: V350125465L1C006R06700311073 83 chr20 30872005 17 27M2I121M = 30871863 -290 ACACCGTGCCCTGGCTGGCAGGATGGGGAGAGGAGGGAGCGTGTCTGTTCACCTGGCCAGCCCTAGGCAACTCTGCAGAGAAAGACACAGGCACTTCCCCTCTGCAGCCAAAGAGTTAAGAAGGCTCGATGTGAAATGAATCATTCCAGG DIDCIGEGFHHEGGHEGGGDGHCFHGHGDHCHHCGHEDDHGEGEHEGFDHCHHFCGHHDGHIGECGHGCDIEGEGICHCHCCCHCHCHDGGICIEEGHHHEHEGGDHHHCDDGCFFFDDHDDHEHEIFDFGDHDDDDHDCDHDFFHIDHH NM:i:5 MD:Z:5A19A18T103 MC:Z:150M AS:i:125 XS:i:116 RG:Z:tumor V350125465L1C006R06700311073 163 chr20 30871863 0 150M = 30872005 290 TGGTTATCTTTAGGTAGCAGAATTCAAGACTGCTTCTTTTTTCTTTTCTTCCTACTTGTATGTTATCTCTATTTCCCTGTGTGAGGATTTATGACTGTTGTGATGAAAAGGCTAGTATTCTAACTCCCTGCATCATAAGCACACACCGTG DHIDDFDIDDCFHHCFHHDHFFDDFEEHFHDHGCCFDCDDDDHDDDDIDDHHDFFCCHDFDGCDFCHCHDFCDDHHHDICHCIFHHEDCDFCICHDHDCGDHEDHEEFEHGHCFGDFDCFCFEHDDHHDHDEBHECEEHGFFFGEGHHCI NM:i:5 MD:Z:32A14T67C2C28A2 MC:Z:27M2I121M AS:i:127 XS:i:137 RG:Z:tumor

In my opinion: if a read can map to multiply locations, then bwa would chose one location as random in output bam file, but I am not sure. So, why is the two repetitive results are different?

And, most important, is there any argument can make the results identical in every repetitive test?

thank you

baoyl818 commented 1 year ago

already soleved.