google / deepvariant

DeepVariant is an analysis pipeline that uses a deep neural network to call genetic variants from next-generation DNA sequencing data.
BSD 3-Clause "New" or "Revised" License
3.25k stars 728 forks source link

Paralogous Regions Germline SNV Calling #900

Closed sounkou-bioinfo closed 3 weeks ago

sounkou-bioinfo commented 1 month ago

Hi,

Thank you for this great tool. I have a question with regard to SNV calling in paraloguous regions for WES/WGS illumina models. As i understand, by default, candidates variants in regions with mapq = 0 are not analyzed. There are attempts to deal with with issue for other callers data https://www.nature.com/articles/s41467-023-42531-9?fromPaywallRec=true.

My question is whether you have data on this issue and if setting the mapq to zero would be advised or it is better advised to use masked genomes.

Thanks

kishwarshafin commented 1 month ago

Hi,

We have not tested DeepVariant's accuracy in paralogous regions outside of the GIAB high-confidence regions. Both of your solution can potentially improve the accuracy. The only suggestion we can give you is to look at the pangenome mapping + DeepVariant case-study here: https://github.com/google/deepvariant/blob/r1.6.1/docs/deepvariant-vg-case-study.md where dropping mapq=0 might give you better resolution in the paralogous regions.

sounkou-bioinfo commented 1 month ago

Thank you for this response Will try that out and weight the additional runtime penalty