Closed Wenfei-Xian closed 2 months ago
Hi @Wenfei-Xian,
A max MAPQ score of 42 will likely have some effect, but I expect not an enormous one. I suspect that MAPQ at the lower end of the ranges would be more important, since if well-calibrated a difference between PHRED=42 and PHRED=60 is a very low additional absolute error probability.
I have some bowtie mapped reads handy for a GIAB sample. I think I can conduct a quick experiment to see if that intuition is right.
Hi @Wenfei-Xian
I finished the experiments. There is certainly a noticeable effect from MAPQ limits, more than I expected. For my experiment, I rewrote the BAM file, setting the MAPQ to 60 for any read with MAPQ of 36 or higher (I observed 44 as the highest MAPQ value an more variability to MAPQ values than seen with BWA.
Experiment | SNP Recall | SNP Precision | SNP F1 | INDEL Recall | Indel Precision | Indel F1 |
---|---|---|---|---|---|---|
Default BAM | 0.9673 | 0.9967 | 0.9817 | 0.9717 | 0.9956 | 0.9835 |
MAPQ 36+ -> 60 | 0.9758 | 0.9964 | 0.9859 | 0.9829 | 0.9960 | 0.9894 |
This implies you will get better performance with DeepVariant if you set those higher MAPQ values to 60. Note that in general, DeepVariant hasn't been trained with Bowtie2 data and you'd likely get better performance overall by a re-training for it.
Hello Andrew,
Many thanks !!!
Best, Wenfei
Hello, Since the highest mapping quality in bowtie2 42, does it affect the the mapping quality channel in deepvariant ?