ropensci / rentrez

talk with NCBI entrez using R
https://docs.ropensci.org/rentrez
Other
195 stars 38 forks source link

Downloading large data sets #139

Open kevinchen27 opened 5 years ago

kevinchen27 commented 5 years ago

I'm trying to access a really large number of records (10,000 to be exact) and used the tutorial to try and attain this. So, I run the following code, first to save web history:

pubmed_search <- entrez_search(db = "pubmed", term = "Case Reports[Filter] AND cardiovascular disease AND English[lang] AND 2009:2019[PDat])", retmax = 792711, use_history = TRUE)

Then, I try to download first 10,000 files:

for( seq_start in seq(1,10000,100)){ recs <- entrez_summary(db="pubmed", web_history=pubmed_search$web_history, retmax=100, retstart=seq_start) cat(seq_start+99, "sequences downloaded\r") } length(recs)

But, I only get 100 files, not 10,000. Can someone help with this, I'm quite confused here as to how to use the web history feature

dwinter commented 5 years ago

Hi Kevin,

This for loop over-writes recs every time through the loop, so you will only get the last 100, you will want to append to recs or use lapply to return a list of lists?