shishenyxx / DeepMosaic

DeepMosaic is a deep-learning-based mosaic single nucleotide classification tool without the need of matched control information.
https://www.nature.com/articles/s41587-022-01559-w
Other
41 stars 5 forks source link

homopolymer and dinucluotide filter #27

Closed huangyuanf closed 9 months ago

huangyuanf commented 9 months ago

Hi, I noticed in the Q&A, you have recommended that For WGS variants, the exclusion of annotated homopolymer and dinucleotide repeats will remove false positives and increase the validation rate, but decrease the sensitivity. But I do not kown what does homopolymer=0 and dinucleotide=0 mean, is it more reliable as it gets closer to zero or less reliable. What do you recommend?

Thanks

shishenyxx commented 9 months ago

Hi huangyuanf,

Thank you for your question! Homopolymer and dinucleotide repeat annotations are to define whether a variant is close to homopolymer and dinucleotide repeats in the genome, selecting a "0" annotation avoids the issue. However, might also exclude some true positives, as these homopolymers and dinucleotide repeats are where polymerases also tend to make mistakes.

So =0 is more reliable, you can try to remove anything that is not equal to 0.

Best,

Xiaoxu