greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine
https://greenelab.github.io/deep-review/
Other
1.25k stars 271 forks source link

DANN: a deep learning approach for annotating the pathogenicity of genetic variants #5

Closed cgreene closed 8 years ago

cgreene commented 8 years ago

Paper needs to be read carefully for relevance https://dx.doi.org/10.1093/bioinformatics/btu703

cgreene commented 8 years ago

Biology: Aim discussed is to identify pathogenic variants.

Computational Methods

Results: "We also generated ROC curves showing the models discriminating pathogenic mutations defined by the ClinVar database (Baker, 2012) from likely benign Exome Sequencing Project (ESP; Fu et al., 2013) alleles with a derived allele frequency (DAF) 5% (Fig. 1b, n = 10 000 pathogenic/10 000 likely benign). \ Is this the same exact model as the one trained on observed vs simulated? do pathogenic look more like simulated or more like observed? **

Summary: predicting pathogenic variants is clearly important for our overall question ("What would need to be true for deep learning to transform how we categorize, study, and treat individuals to maintain or restore health?"). Right now, precisely how this was done in this paper remains a bit confusing to me: particularly whether or not the pathogenic model is the same as the observed/simulated model. I also have some relatively minor concerns around potential performance estimate issues due to training/testing breakdown. Definitely consider inclusion due to major topic relevance, though caveats may be important to discuss.

evancofer commented 8 years ago

Model: The supplementary materials (https://cbcl.ics.uci.edu/public_data/DANN/readme) indicate that the ClinVar/ESP (pathogenic/benign) set is for testing, not training. I therefore suspect that they used the model trained on observed/simulated data to classify the ClinVar/ESP data. There is mention of reusing the observed/simulated test set to combat overfitting, but it is not entirely clear whether the ClinVar/ESP test set was used in the same way.

cgreene commented 8 years ago

@evancofer - nice catch!

cgreene commented 8 years ago

This is an interesting paper. I've labeled it for the 'study' component. It's not receiving more discussion at this point so I've closed it. We're now using 'open' papers only for items undergoing active discussion.

agitter commented 7 years ago

I'm re-reading this to write about data simulation for the discussion. As far as I can tell, they are not doing anything new for the simulated data. It appears to come from the CADD paper. It's still worth discussing.