igematberkeley / NLPChemExtractor

3 stars 0 forks source link

Building labeled chemprot dataset #29

Open jacob-Iuo opened 3 years ago

jacob-Iuo commented 3 years ago

Just your standard data cleaning stuff. Purpose of this dataset is to test labeling functions for feature discovery and model validation. The path to this dataset on Savio is fc_igemcomp/2020_nlp/snorkel/chemprot_sentence_level_cleaned.csv

jacob-Iuo commented 3 years ago

Currently tagging more enzymes manually because Chemprot only flags enzymes w/ substrates and products