Open chemoton opened 5 days ago
I have found the following info in the wiki: Slowing rentrez down when you hit the rate-limit rentrez won't let you send requests to the NCBI at a rate higher than the rate-limit, but it is sometimes possible that they will arrive too close together an produce errors. If you are using rentrez functions in a for loop and find rate-limiting errors are occuring, you may consider adding a call to Sys.sleep(0.1) before each message sent to the NCBI. This will ensure you stay beloe the rate limit.
I think you're probably still have rate limiting issues. You are currently circumventing the rate control mechanisms of entrez_fetch()
with the for loop. Instead of sending a bunch of requests with one ID each, it would be better to make a few requests (or possibly 1 request depending on how many) with many IDs. entrez_fetch()
is vectorized.
Also, do you have an API key (https://www.biostars.org/p/299812/)? Some rate and API information can be found at https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/.
Hi,
I am trying to get the lineage data for a set of taxIDs. When I try it with entrez_fetch(), it is working fine:
tax_rec <- entrez_fetch(db="taxonomy", id=coi1[1,1], rettype="xml", parsed=TRUE)
where coi1 is a dataframe, where the first column is taxID However, when I try to loop through all IDs in the rows, it will always give the following error:The error will appear at a random index, for example it parsed through 46 records the first time, then 89 for the second. I have tried to play around with httr:GET config as suggested in an other issue (#87 ), but it did not help. I have doubts that I even used it appropiately, as I could not find usage examples, the code ran however still producing the error abouve at random index.
I have found the following info in the wiki: Slowing rentrez down when you hit the rate-limit rentrez won't let you send requests to the NCBI at a rate higher than the rate-limit, but it is sometimes possible that they will arrive too close together an produce errors. If you are using rentrez functions in a for loop and find rate-limiting errors are occuring, you may consider adding a call to Sys.sleep(0.1) before each message sent to the NCBI. This will ensure you stay beloe the rate limit.
So I included it my loop, but it did not solve the issue either. As the individual requests always work, I highly doubt it is network or NCBI issue.
Full code for looping through IDs:
y <- list()
for (i in 1:nrow(coi1)){
if (coi1[i,1] == 1) {
tax_rec <- entrez_fetch(db="taxonomy", id=coi1[i,1], rettype="xml", parsed=TRUE)
tax_list <- XML::xmlToList(tax_rec)
y[[i]] <- tax_list$Taxon$ScientificName
Sys.sleep(0.1)
} else {
tax_rec <- entrez_fetch(db="taxonomy", id=coi1[i,1], rettype="xml", parsed=TRUE)
tax_list <- XML::xmlToList(tax_rec)
y[[i]] <- tax_list$Taxon$Lineage
Sys.sleep(0.1)
}
}
Any input/help appreciated