Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data

agitter commented 8 years ago

http://doi.org/10.1101/069682 (DOI still processing http://biorxiv.org/content/early/2016/08/15/069682)

Across many species, a large fraction of genetic variants that influence phenotypes of interest is located outside of protein-coding genes, yet existing methods for identifying such variants have poor predictive power. Here, we introduce a new computational method, called LINSIGHT, that substantially improves the prediction of noncoding nucleotide sites at which mutations are likely to have deleterious fitness consequences, and which therefore are likely to be phenotypically important. LINSIGHT combines a simple neural network for functional genomic data with a probabilistic model of molecular evolution. The method is fast and highly scalable, enabling it to exploit the "Big Data" available in modern genomics. We show that LINSIGHT outperforms the best available methods in identifying human noncoding variants associated with inherited diseases. In addition, we apply LINSIGHT to an atlas of human enhancers and show that the fitness consequences at enhancers depend on cell-type, tissue specificity, and constraints at associated promoters.

agitter commented 8 years ago

I initially added this when I saw "neural network" in the abstract. Now that I'm reading it, I noticed:

Indeed, the model can be considered a type of neural network, albeit one without hidden layers.

Figure 1b indicates they have an input later directly connected to a sigmoid output layer with two outputs. The features are things like conservation scores, presence of a miRNA or TF binding site, epigenetic signals, etc. (Table 1, Supplementary Table 2). They mention that it could easily be extended to include hidden layers, but we probably want methods to have at least one hidden layer for this review regardless of how good the method is.

gwaybio commented 8 years ago

Ok, I will close this issue for now

greenelab / deep-review

Fast, scalable prediction of deleterious noncoding variants from functional and population genomic data #83