greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
58 stars 17 forks source link

Dataset statistics #76

Closed danich1 closed 5 years ago

danich1 commented 5 years ago

This pull request uploads statistics about each relationship type (compound treats disease (CtD), gene interacts gene (GiG), compound binds gene (CbG)). Main files that needs review are: the dataset_statistics.ipynb notebooks. The other files added are data files that these notebooks generate for future use (downstream experiments).

Since this is the first time I am working with the gene interacts gene relationship, I have added a notebook that generates a dataframe containing information about gene entities and statistics they share with each other. Feel free to take a look at the notebook (gene_gene_datafile_generator.ipynb), but it isn't imperative that you review it.

dhimmel commented 5 years ago

Okay will look at dataset_statistics.ipynb

danich1 commented 5 years ago

Thanks @ajlee21 for your feedback and comments. As discussed I changed the way the venn diagram looked to avoid confusion on interpretation. Any additional comments let me know.

Pinging @dhimmel to see if you have anything you would like to add about this pull request.