sckott / auditopendata

audit open data in articles
4 stars 0 forks source link

get all HTTP links from open access papers #1

Open max-mapper opened 7 years ago

max-mapper commented 7 years ago
  1. get all open access fulltexts
  2. extract all HTTP links
  3. release as a big list of http links and the papers they came from

herbert von sompel did a similar thing in http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0115253 but didnt do step 3 and i think step 1 was just elsevier+springer articles

anyone know datasets for 1?

sckott commented 7 years ago

wrt 1. - thoughts? @eseiver @rossmounce

rossmounce commented 7 years ago

Cool idea, would make a useful ongoing resource.

Reading through the Klein/von Sompel paper, did they not also do PMC too? The extracted PMC links I think can be found in their supplementary materials here: https://figshare.com/articles/PMC_Memento_data/1132673 I'm not too familiar with the file format though: .pkl?

rossmounce commented 7 years ago

Further thoughts.

Even doing step 1 is challenging. Everything is fragmented, no one-stop shop, although for biomedical research PMC obviously does a great job (https://www.ncbi.nlm.nih.gov/pmc/tools/openftlist/)

CORE has over 5 million OA full texts: https://blog.core.ac.uk/2017/02/03/core-now-offers-5-millions-of-open-access-full-text-research-papers/

There's also Bielefeld Academic Search Engine (BASE): https://www.base-search.net/

eseiver commented 7 years ago

I maintain zip of all PLOS research articles in XML, updated daily, at https://drive.google.com/open?id=0B_JDnoghFeEKLTlJT09IckMwOFk

Across journals, as @rossmounce mentioned, PMC is probably the best resource. Wish their file names were DOIs and not PMIDs though.