Closed usuyama closed 4 years ago
Thank you for your question!
We used a primitive filter using the MeSH Tree Number for extracting articles in the paper. We are currently submitting it to a journal and will provide this filter, if necessary, as the peer review progresses.
The following is an overview. Using each MeSH ID, for example, for articles containing "Diseases [C]", this would cover about 11GB of the entire PubMed abstracts (20GB) we collected. Because PubMed articles contain a lot of basic research that target non-human subjects, we used MeSH Tree Structures to focus on articles that are likely related to human medicine. As examples, articles including Technology, Industry, and Agriculture [J], Information Science [L], Plant Structures [A18], Fungal Structures [A19], Bacterial Structures [A20], ..., or Viral Structures [A21] are excluded, resulting in focused Pubmed abstracts (1.8GB).
Please refer to MeSH Browser, too.
Thanks, Shoya
Thanks for your prompt reply. Good luck your journal submission!
Thanks for releasing this great repo!
I'm wondering about the criteria for the
fP
dataset. How did you choose "closely related to human beings"?Thanks, Naoto