samtools / bcftools

This is the official development repository for BCFtools. See installation instructions and other documentation here http://samtools.github.io/bcftools/howtos/install.html
http://samtools.github.io/bcftools/
Other
662 stars 240 forks source link

Parameter -q in bcftools does not work as specified #2187

Open Lilu-guo opened 4 months ago

Lilu-guo commented 4 months ago

Thank for the excellent tool bcftools. I have a problem and want to ask you for advice.

I have a sam file as follows, containing the alignment info of a single read: HISEQ1:93:H2YHMBCXX:1:1101:2861:2099 16 1 124397141 25 250M * 0 0 TTGTGATGTGTGCGTTCAACTCACAGAGTTTAACCTTTCTTTTCATAGAGCAGTTAGGAAACACTCTGTTTGTAAAGTCTGCAAGTGGATATTCAGACCTCGTTGAGGCCTTCGTTGGAAACGGGATTTCTTCATATTCTGCTAGACAGAAGAATTCTCAGTAACTTCCTTGTGTTGTGTTTATTCAACTCACAGAGTTGAATGATCCTTTACACAGAGCAGACTTGAAACACTCTTTTTGTGGAATTTG EHHHHHHCHHDCHGCCHHHHHEGFHGCEG.HHFHFHHHCIHCHHHHIIIHHHHIIHGF@CCEGCHEECHEHIHHEHIHIHCEGHIHIHHIHIIIIHCIIIHIIIHGHHIHIHIHE@EG@D@HFCFIIIIHHHIIIIHHHIIIHIIIIHGHIIIIIIHIHIIHHGEHIHIIIIIIIIIIIHIIIHHHIHHHHIHIIHEHHHFGIIIHEHHFHIHFEGHEGIIIIIIIGIIIIIIIIIIIIIHHFHHDDBDB AS:i:-6 XS:i:-6 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:101T148 YT:Z:UU

I used the following command to do SNP variant calling, but did not get the results: bcftools mpileup -Ou -f /home/lab/gll/simPan/genome.fa -q 20 $NAME.bam | bcftools call -mv -Ov -o $NAME.vcf

However, when I manually change the MapQ value of 25 in the sam file to 30, it can get the SNP variant calling.


My confusion is, I have specified -q as 20, why the MapQ of 25 in the sam file is still ignored? Looking forward to your reply, thank you in advance.

pd3 commented 4 months ago

To cross-reference, this seems to be a duplicate of https://github.com/samtools/bcftools/issues/1923#issuecomment-2118679724.

Since the example does not show the reference sequence, I can only make a few general comments. Variant calling is a compromise between sensitivity and specificity, the bigger problem is poor specificity. Seeing a mismatch in a single read usually does not give enough confidence to deem it a genuine variant. When debugging problems like this, one should start with the raw output from bcftools mpileup, check if tweaking any of the mpileup options can help to obtain reasonable FORMAT/AD and PL values, check if adjusting bcftools call -P can solve the issue.