Closed danich1 closed 5 years ago
Just to check - this doesn't look like it includes trying them as labeling functions. Is that correct?
Yes this is just using them as features. Using them as label functions will come later.
No need for an in depth review in this case. This was a small experiment to see how informative Russ's data on Pubtator was in terms of predicting disease associates with gene relationships. More important experiment will be coming this week.
This PR contains data from the paper "A global network of biomedical relationships derived from text" paper here). We used these clusters themes as features to aide in our goal of predicting disease associates with gene (DaG) relationships. Unfortunately these clusters themes do not seem to have predictive power in our case.
TLDR Paper Summary Here: This paper focused on using the bi-clustering algorithm on a matrix dependency paths that were generated from every sentence in Pubtator (version Sept 15-2017). After these clusters were generated, manual curation was involved to assign "themes" to each generated cluster. Lastly, theme distributions were calculated and certain paths were manually identified to be central to their overall theme.
Themes used for this experiment: