piskvorky / gensim-data

Data repository for pretrained NLP models and NLP corpora.
https://rare-technologies.com/new-api-for-pretrained-nlp-models-and-datasets-in-gensim/
GNU Lesser General Public License v2.1
965 stars 128 forks source link

New Corpus - Semantic Scholar #45

Open abhirupnandy opened 3 years ago

abhirupnandy commented 3 years ago

Semantic Scholar Open Corpus is a very large dataset of research article metadata. This dataset can be used to analyze research articles, find similarities between research articles, Topic modeling, Keyphrase Extraction, Document Clustering, and much more. Can a pre-trained model for this dataset be published?

Semantic Scholar Open Corpus

piskvorky commented 3 years ago

Sure, why not. Can you train it & open a PR?

In the PR, please include clear motivation and all scripts you used, so the result is reproducible and its impact is clear to potential users. Thanks!