ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

inconsistency with search result count numbers #111

Closed timedreamer closed 7 years ago

timedreamer commented 7 years ago

Hi, I really like this package you developed. I noticed a strange thing today that the return numbers of paper querying two years (or several years) from Pubmed does not equal to summation of querying individual year alone. I paste my code below. I queried SRA before without any problems.

I tried three journal and here is the result. Nature Communications returns the equal number.

Journal 2011 2012 2011:2012
Plant Physiology 622 566 1100
JACS 3386 3228 6413
Nature Communications 445 703 1148

Code:

entrez_search(db="pubmed", term= "Plant physiology[JOUR] AND 2011[PDAT]",retmax=0)$count
entrez_search(db="pubmed", term= "Plant physiology[JOUR] AND 2012[PDAT]",retmax=0)$count
entrez_search(db="pubmed", term= "Plant physiology[JOUR] AND (2011:2012[PDAT])",retmax=0)$count

Any ideas?

Thanks!

Ji

dwinter commented 7 years ago

Hi @timedreamer,

I gather this is the result of some journals having advanced online publication, and both online and in-issue dates being included in their records. Depending on exactly what are trying to to there are a few ways around this

If you are using the dates to "batch" jobs that would otherwise have too many records to deal with you can instead use EDAT, which is the moment they were made available on Entrez.

twenty_eleven <- entrez_search(db="pubmed", term= "Plant physiology[JOUR] AND 2011[EDAT]")
twenty_twelve <- entrez_search(db="pubmed", term= "Plant physiology[JOUR] AND 2012[EDAT]")
both <- entrez_search(db="pubmed", term= "Plant physiology[JOUR] AND 2011:2012[EDAT]")
twenty_eleven$count + twenty_twelve$count
[1] 1004
both$count
[1] 1004

If you are actually interested in the publication date you might need to extract information on each paper using entrez_summary even entrez_fetch.

Happy to keep this issue open if you have more questions about this workflow.

timedreamer commented 7 years ago

Cool. That answers my question. I will use EDAT instead.

I do not have further questions regarding this. I will close the issue.

Thank you!