ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

Robust way to return DOI for a given Pubmed ID? #136

Open iainmwallace opened 5 years ago

iainmwallace commented 5 years ago

Hi,

I was wondering if anyone might have a suggestion to return the DOI for a given pubmed id in a robust manner?

I tried to use the id converter, but for a number of my ids the converter says the pubmed id is invalid. For example, this id 26479441 is a valid pubmed id but the converter thinks it is invalid https://www.ncbi.nlm.nih.gov/pmc/utils/idconv/v1.0/?tool=my_tool&email=my_email@example.com&ids=26479441

The closest that I have gotten is to use entrez_fetch and parse out the ELocationID, which if it exists will contain the DOI, but it might also contain other identifiers which I am not clear on how to exclude from the XML parsing, see example code below.

I have (I believe) come across pubmed ids with DOIs that don't have the ELocationID xml tag.

Any pointers/tips would be greatly appreciated.

Iain

example parsing from https://github.com/ropensci/rentrez/issues/100

library(rentrez) library(XML)

return_doi<-function(pubmed_id){ entrez_xml <- entrez_fetch(db="pubmed", id=pubmed_id, rettype="xml") parsed_xml<-XML::xmlParse(entrez_xml) elocation_id<-XML::xpathSApply(parsed_xml, "//ELocationID", XML::xmlValue) return(elocation_id) }

return_doi(26479441) # should be 10.1038/nchembio.1936

> [1] "10.1038/nchembio.1936"

return_doi(28917822) # should be "10.1016/j.drudis.2017.09.004"

> [1] "S1359-6446(17)30102-2" "10.1016/j.drudis.2017.09.004"

dwinter commented 5 years ago

Hi @iainmwallace ,

I think this is one case were you can use the pase_pubmed_xml function.

rec <- parse_pubmed_xml(entrez_fetch(26479441, db="pubmed", parsed = TRUE, rettype = "xml"))
rec$doi
[1] "10.1038/nchembio.1936"

Alternatively, the xpath used by the internal part of this function is .//PubmedData/ArticleIdList/ArticleId[@IdType='doi']

JimHokanson commented 4 years ago

I recently talked with NLM support and apparently the ID converter was written by "the PMC team." Specifically, you need a valid PMC in order to go from PMID to DOI using the converter. If you use the web interface and return html the error "Identifier not found in PMC" makes this clear. I mentioned to them that this is not clear with all of the return types (I was testing JSON), and that going from PMID to DOI shouldn't require a PMCID. I asked them to update documentation or the code. The support person mentioned they would pass on the issue but made no guarantees.

I'm seriously considering starting a small web app/REST API that maps between these IDs in a simple way. Neither NLM or CrossRef, as far as I can recall, make this as simple as it should be.

iainmwallace commented 4 years ago

https://web.hypothes.is/ might also be an interesting service to use to solve the problem of mapping from a url back to a doi.

JspSrs commented 1 year ago

Unfortunately, it took me a while to realize that the errors I retrieved today using https://www.ncbi.nlm.nih.gov/pmc/tools/idconv/ were from a subset of the list of DOI of which I want a PMID. These DOI are having a valid PMID but being absent in PMC were not having a PMCID. Reading more carefully the PMC requirement is mentioned here and there. Currently I want to update a list of 500+ references with a DOI with their Pubmed equivalent (to be used in a BED-detail file). So far my options seem to be single line browser queries, like in the examples mentioned earlier/above. I am surprised that I could not find any batch version to convert DOI to PMID on https://www.ncbi.nlm.nih.gov/ I support an extension of the converter OR a more clear statement that the PMC-ID is the leadin_D in this tool.

Ending positive: I did start with eutils just now and the first result was encouraging.