outbreak-info / outbreak.info-resources

A curated repository of metadata of resources on COVID-19 and SARS-CoV-2
MIT License
0 stars 4 forks source link

[PUBLICATION] Train model to categorize litcovid data by topicCategories #111

Closed gtsueng closed 2 years ago

gtsueng commented 4 years ago

Potential training docs for each category can be found here: https://github.com/SuLab/outbreak.info-resources/blob/master/metadata/pmids_for_training.tsv

Use the PMID list to pull the abstracts needed to train the model.

topicCategories are NOT mutually exclusive, and each publication may be categorized with multiple topicCategories.

Whenever possible, to use the most specific topicCategories available (ie- subCategories = True), default to the broader categories (ie- subCategories = False) when specific topicCategories not available.

gtsueng commented 2 years ago

Model can probably be improved by someone with more expertise, but it does the job for now. Can be found here: https://github.com/outbreak-info/topic_classifier