shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.
https://www.nature.com/articles/s41587-022-01559-w
Other
42 stars 5 forks source link

too many PASS for 30X WGS #33

Closed huangyuanf closed 4 months ago

huangyuanf commented 8 months ago

Hi, For my 30x genome (after BQSR and indel realignment) has 100000 PASS variants, using GATK Mutect2 single mode .Based on your responses to other questions, I filtered out all variants locating near near indel, homopolymer, repeats, there were still 80000. That seems like too much. But how I set gnomAD frequency(less or more than)?

Anything else that you do standard to filter the variants to reduce the number?

Thanks

shishenyxx commented 8 months ago

Hi huangyuanf,

If it's only 30x you can input one sample to DeepMosaic and see how long it might take, I assume you can finish everything within 24 hours. The output will include gnomAD frequency. Alternatively, you can use annotation tools like VEP or ANNOVAR to annotate gnomAD frequency and filter out before feeding into DeepMosaic.

You can also check the distribution of your variants to see whether they enrich in certain chromosomes/regions (indicating SV/aneuploidies). Normally there shouldn't be a high number from 30x (M2 single mode). but if you used PCR-amplified library preparation methods that's a different story.

Hope that helps,

Xiaoxu