As a subsample of 1000 ids is too large to feed directly into entrez_fetch, the only way i see to handle this is to then use entrez_post to upload the subsampled ids in chunks, and entrez_fetch to then download the chunks.
However this is quite slow. Instead, the documentation for entrez_post seems to suggest that i should be able to append the ids to an existing web_history object, and then the entire web_history object could be downloaded a single entrez fetch call. I tried this with the below code, however in this case entrez_fetch only downloads the last chunk of 100 ids i uploaded:
#create new webhistory object
upload <- entrez_post(db="nuccore", id=ids_chunked[[1]])
#Add to web history object
for (l in 2:length(ids_chunked)) {
upload <- entrez_post(db="nuccore", id=ids_chunked[[l]], web_history=upload)
}
dl <- entrez_fetch(db = "nuccore", web_history = upload, rettype = "fasta", retmax = 10000)
cat(dl, file="out.fa", append=FALSE)
Do you have any input on what i am doing wrong here, or suggestions on better ways to do this (i.e. can i somehow subset the webhistory object directly on the NCBI server without having to post the ids again?)
I have exactly the same problem. I tried to add IDs to an existing web history object, but only the last chunk could be fetched. @alexpiper , has there been any success since then?
Hi rentrez developers,
I have a situation where i want to search for a taxa and specific gene, and only download a random subsample of these search results.
As a subsample of 1000 ids is too large to feed directly into entrez_fetch, the only way i see to handle this is to then use entrez_post to upload the subsampled ids in chunks, and entrez_fetch to then download the chunks.
However this is quite slow. Instead, the documentation for entrez_post seems to suggest that i should be able to append the ids to an existing web_history object, and then the entire web_history object could be downloaded a single entrez fetch call. I tried this with the below code, however in this case entrez_fetch only downloads the last chunk of 100 ids i uploaded:
Do you have any input on what i am doing wrong here, or suggestions on better ways to do this (i.e. can i somehow subset the webhistory object directly on the NCBI server without having to post the ids again?)
Cheers, Alex