greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
58 stars 17 forks source link

Russ Data Experiment #60

Closed danich1 closed 5 years ago

danich1 commented 5 years ago

This PR contains data from the paper "A global network of biomedical relationships derived from text" paper here). We used these clusters themes as features to aide in our goal of predicting disease associates with gene (DaG) relationships. Unfortunately these clusters themes do not seem to have predictive power in our case.

TLDR Paper Summary Here: This paper focused on using the bi-clustering algorithm on a matrix dependency paths that were generated from every sentence in Pubtator (version Sept 15-2017). After these clusters were generated, manual curation was involved to assign "themes" to each generated cluster. Lastly, theme distributions were calculated and certain paths were manually identified to be central to their overall theme.

Themes used for this experiment:

Theme Abbreviation Theme
U Causal Mutations
Ud mutations affecting disease course
D drug targets
J role in pathogenesis
Y polymorphisms alter risk
G promotes progression
Md biomarkers (diagnostic)
X overexpression in disease
L improper regulation linked to disease
cgreene commented 5 years ago

Just to check - this doesn't look like it includes trying them as labeling functions. Is that correct?

danich1 commented 5 years ago

Yes this is just using them as features. Using them as label functions will come later.

danich1 commented 5 years ago

No need for an in depth review in this case. This was a small experiment to see how informative Russ's data on Pubtator was in terms of predicting disease associates with gene relationships. More important experiment will be coming this week.