ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
194 stars 38 forks source link

Discrepancy between Pubmed Search results and entrez_search() #150

Closed rachitest closed 4 years ago

rachitest commented 4 years ago

Hi,

I have been trying to gather the doi's for multiple PMIDs via entrez_search(), but I've noticed that the results for entrez_search are different than the actual pubmed search results for the same search term.

For example,

If I use the search term: "Biol. Chem. 383:1519-1536 (2002)" directly in pubmed I get: https://pubmed.ncbi.nlm.nih.gov/12452429/

however using the same search term with entrez_search() results in:

> entrez_search(db='pubmed', term = 'Biol. Chem. 383:1519-1536 (2002)')
Entrez search result with 0 hits (object contains 0 IDs and no web_history object)
 Search term (as translated):  ((("Biol Chem"[Journal] OR ("biol"[All Fields] AND ...

I have even tried using the exact query that I retrieved from pubmed's search history but to no success:

> entrez_search(db='pubmed', term = '(("biol chem"[Journal] OR ("biol"[All Fields] AND "chem"[All Fields])) OR "biol chem"[All Fields]) AND "383"[All Fields] AND "1519-1536"[All Fields] AND "2002"[All Fields]')
Entrez search result with 0 hits (object contains 0 IDs and no web_history object)
 Search term (as translated):  (("biol chem"[Journal] OR ("biol"[All Fields] AND  ...

I would appreciate any help in order to understand what exactly is causing this discrepancy and if it can be fixed.

dwinter commented 4 years ago

Hmm, that's a bit of mystery, I know the web entrez is not exactly the same as the Eutils API that rentrez uses. But entering a search term built with the web query builder prodces no hits even though it is parsed correctly:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=((%22biological%20chemistry%22[Journal])%20AND%20(383[Issue]))%20AND%20(1519-1536[Pagination])

This might the one to direct to the NCBI helpdesk?

rachitest commented 4 years ago

Thanks for looking into it, I've pushed a ticket to the NCBI helpdesk and will update as the situation proceeds

rachitest commented 4 years ago

Per the NCBI helpdesk PubMed Search supports fuzzy matching while the EUtils API does not which is what is leading to the difference, and supposedly formatting the page numbers in the following way:

1234-56

vs

1234-1256

will provide a working EUtils API search (I have not managed to recreate this on a decently sized data set so I'm taking it with a grain of salt)