mchowdh200 / samplot-ml

MIT License
22 stars 4 forks source link

Is it feasible to correct false negative genotype (genotyped as 0/0, actually 0/1) #11

Open ygwang1 opened 5 months ago

ygwang1 commented 5 months ago

Hi, Thanks to the authors for developing this method! I was looking for a method for further confirmation of CNV genotypes. I'm using samplot for visual confirmation, which is very helpful for rare CNVs or small size sample studies. However, it is difficult to manually visualize and confirm when performing CNV genotyping in a large cohort. WGS is inevitably inaccurate for SV detection of particularly short CNVs (<100/200bp). By visualizing a part of the genotyping results, I found a problem with false positives and false negatives for some variants, and I wanted to resolve this as much as possible, rather than just excluding the variant, especially when it is possibly valuable in my cohort. This is rarely seen in other studies, it seems they prefer to exclude the variants with quality control, but I found that for some CNVs only a few of the genotypes in the samples were inaccurate and I wondered if it would be possible to 'correct' for these genotypes. That's why I'm trying samplot-ML, but I ran into another problem. Samplot-ML seems to be able to correct 0/1 and 1/1 genotypes and the false negatives (genotyped as 0/0) don't seem to be able to be corrected, so I wondered if the methodology might be feasible and I hope you can answer this question. Am I clear on what I mean, I may be a bit tedious in my words. Looking forward to your reply!

Best, Yige

mchowdh200 commented 5 months ago

That's a good question. During the development of Samplot-ML we explored the idea of also correcting false negatives. What we found was that genotypers like SVTyper were already pretty good at finding true positives given a set of called SVs from something like lumpy. We experimented with increasing the sensitivity of the SV caller then applying Samplot-ML, but decided that the amount of false positives was not acceptable.

The current incarnation of Samplot-ML is focused on classifying putative 0/1 & 1/1 calls and filters out false positives while minimizing loss of sensitivity.

ygwang1 commented 5 months ago

Okay, I get it. It's really not an easy problem to solve and it's hard to balance. Thanks a lot for your reply!