Open emos8710 opened 1 month ago
Yes, I also found that the specified MQ threshold didn't seem to work. For example, when using -q 20, it still uses 30 as the threshold; and when using -q 40 (greater than 30), it uses 40 normally.
Maybe we can discuss bcftools further. It will be more efficient to use WeChat or other instant tools.
I am unable to comment without seeing the data, ideally a small bam slice + corresponding fasta reference chunk.
EDIT: This script can be used to create a small test case https://github.com/pd3/mpileup-tests/blob/main/misc/create-bam-test
Thanks for your reply, I will get back with some data!
Hi!
I have an issue that is puzzling me. I have two salmonella samples that I'm comparing and expect them to be highly similar if not identical. I have mapped them with bowtie2 (three different versions as I was investigating if it was the mapping that was the issue) and created a mpileup and called variants with bcftools v 1.14 and 1.17 (no difference in the result)
bcftools call -m -v -O v --ploidy 1
I'm getting some variant calls that I don't understand. Here is the best example. There are three copies of each sample as I was checking if there was any difference between bowtie2 versions.
I have also tried running the mpileup with a MQ threshold of 20, which barely makes a difference. For sample 1 there are 23 reads supporting REF and 126 reads supporting ALT. For sample 2 there are 8 reads supporting REF and 112 reads supporting ALT (plus 2 reads supporting a second ALT). However bcftools seems pretty confident in the REF call for sample 1 and very confident in the ALT call for sample 2. Why is this? I would expect more uncertainty, so I could filter this position :)