Large-Scale Discovery of Disease-Disease and Disease-Gene Associations

enricoferrero commented 7 years ago

http://doi.org/10.1038/srep32404 (https://www.nature.com/articles/srep32404)

Data-driven phenotype analyses on Electronic Health Record (EHR) data have recently drawn benefits across many areas of clinical practice, uncovering new links in the medical sciences that can potentially affect the well-being of millions of patients. In this paper, EHR data is used to discover novel relationships between diseases by studying their comorbidities (co-occurrences in patients). A novel embedding model is designed to extract knowledge from disease comorbidities by learning from a large-scale EHR database comprising more than 35 million inpatient cases spanning nearly a decade, revealing significant improvements on disease phenotyping over current computational approaches. In addition, the use of the proposed methodology is extended to discover novel disease-gene associations by including valuable domain knowledge from genome-wide association studies. To evaluate our approach, its effectiveness is compared against a held-out set where, again, it revealed very compelling results. For selected diseases, we further identify candidate gene lists for which disease-gene associations were not studied previously. Thus, our approach provides biomedical researchers with new tools to filter genes of interest, thus, reducing costly lab studies.

This one is quite interesting and I haven't seen it mentioned in here. It's not overly obvious but my understanding is that they are using deep learning:

The neural network is trained by projecting the vectors for context words into a latent representation with multiple non-linear hidden layers and the output softmax layer comprising W nodes, where W is a size of the vocabulary (equal to the number of diseases in our task), while attempting to predict word wt with high probability.

It could fit in the EHR section even though it's not really about categorisation. I think the most interesting bit is about novel gene - disease associations (which could be targets) so maybe Treat is a better place?

agitter commented 7 years ago

It might fit in Study or Categorize. Most of the EHR work has been discussed in Categorize. The gene-disease associations seem more like Study than Treat to me, though the lines are blurry between these sections.

enricoferrero commented 7 years ago

OK, I've just finished reading it and, despite being a methodological paper, the architecture of the (deep?) neural network for their neural embedding model is not very well described in my opinion. Generally speaking the, neural network aspect is not emphasised so I'm not sure it should be included. It's still a cool paper though and the DAG2D performs very well and could be used effectively for novel target identification.

If @agitter or @brettbj (I'm assuming you authored the EHR section) could have a look and advise whether this should be included or not, that'd be great.

Section-wise, I'm not sure how it would fit in Study so I would probably try to incorporate it into the existing EHR section in Categorise (provided you think it's worth including).

agitter commented 7 years ago

@enricoferrero I may not have much time to look at this carefully reasonably soon. I'm pretty consumed looking over the recent pull requests on other topics. I trust your judgement. We have also been fine leaving out papers that aren't a good fit or don't describe a model or evaluation in sufficient detail.

greenelab / deep-review

Large-Scale Discovery of Disease-Disease and Disease-Gene Associations #342