Closed hammer closed 5 years ago
Oh there's also https://github.com/ropensci/rentrez. They don't know how to parse efetch's weird JSON-ish data either.
For extra fun we could try to pull down full text, cf. https://www.ncbi.nlm.nih.gov/pmc/tools/get-full-text/ and https://www.ncbi.nlm.nih.gov/pmc/tools/ftp.
Returns some weird thing that looks like typed JSON but isn't so I don't know to parse it; going to have to work w/ XML, sadly.
I used to parse the clinical trials XML archive from NCI and all I can say is even if you can parse the XML, there will be lots of weird edge cases to its syntax. I ended up using one of the standard XML-to-JSON solutions and always worked with the converted JSON to stay sane. There was a nice xml2json
node library that could do this about five years ago. I assume we would have even better ones now and IMHO that would be the way to go.
Docs: E-utilities Quick Start
R client: https://github.com/gschofl/reutils (uses R5/Reference classes object system, yikes)
Example efetch query for a single PMID:
https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&id=22368089
. Returns some weird thing that looks like typed JSON but isn't so I don't know to parse it; going to have to work w/ XML, sadly.