greenelab / snorkeling

Extracting biomedical relationships from literature with Snorkel 🏊
Other
59 stars 17 forks source link

Updating the Repository with Bug Fixes and Stratified File Strategy #29

Closed danich1 closed 6 years ago

danich1 commented 6 years ago

Majority of this PR consists of jupyter cell reorganization and bug fixing. The newly added file is the 1a.stratify-candidates.ipynb, which is a file that takes all disease-gene pair mappings and sorts them into train, test and development categories. Lastly, the disease_gene_lf.py file has some labeling functions that are ahead of the game. Feel free to gloss over, since this PR turn out to be a little larger than I expected.

dhimmel commented 6 years ago

Did not review closely, but let's squash merge.

Note that unless a PR has a well defined scope, it is difficult to review. This PR does many things... So if you'd like more productive review, make many smaller PRs. This can be difficult with a notebook oriented workflow.