outbreak-info / outbreak.info-resources

A curated repository of metadata of resources on COVID-19 and SARS-CoV-2
MIT License
0 stars 4 forks source link

[DATASET, etc.] Create Mendeley Dataset Parser #94

Closed flaneuse closed 4 years ago

flaneuse commented 4 years ago

More specific version of #10. Basic goal: find all COVID-19 / SARS-Cov-2 datasets in Mendeley, map to our Dataset schema, and integrate into api.outbreak.info/resources

Write parser.py

  1. Get all datasets/files related to COVID-19 or SARS-CoV-2 using the Mendeley API.

Note: you'll need to get each of the datasets/etc. within the 4 collections.

Note 2: you'll need to create an account with them to use the API.

  1. Export the metadata in Schema.org JSON-LD format (not sure if available with the API directly). Could alternatively grab the data by crawling each URL and grabbing the <script type="application/ld+json"> tag, which you can view in Google's Structured Data Testing Tool

Note: I was only briefly skimming the API guide, so I'm not certain if you can grab this directly from the search API call, or if you'll need to run a search call to get all the COVID-related IDs, and then another API call to get the metadata -- or whether you'll need to scrape the pages to get the schema.org Githubissues.

  • Githubissues is a development platform for aggregating issues.