hscells / cui2vec

Utility for cui2vec in Go
MIT License
11 stars 1 forks source link

Training data? #3

Open memray opened 5 years ago

memray commented 5 years ago

Hi,

Thank you for sharing the amazing study. I wonder by which means we can acquire the datasets used for training the cui2vec:

Thank you, Rui Meng

hscells commented 5 years ago

Hi Rui,

This code is not affiliated with the authors of the publication. I recommend asking the actual authors.

a nationwide US health insurance plan with 60 million members over the period of 2008-2015

I do not see references in the paper for this dataset, so it is likely not available for public use (see section 3.1).

a dataset of concept co-occurrences from 20 million notes at Stanford

The authors reference this paper: https://www.nature.com/articles/sdata201432 (Building the graph of medicine from millions of clinical narratives)

an open access collection of 1.7 million full text journal articles obtained from PubMed Central (I know this is accessible)

This appears to be a subset of PMC, which is indeed freely available: https://www.ncbi.nlm.nih.gov/pmc/ (see the section called Developers). But it is unclear what the authors did to filter the articles.

It would be great if you do decide to contact the authors to respond to the issue with answers as I think it would be of benefit to anyone else wondering.

Cheers, Harry

memray commented 5 years ago

Hi Harry,

Thank you for your kind reply! I will contact the authors and come back once I have the answer.

Best, Rui