Open janstrauss1 opened 4 years ago
there seems to be a related issue for the taxize
package
https://github.com/ropensci/taxize/issues/785#issuecomment-554462753
Are you definitely using NCBI? The data source in question in that taxize issue 785 is for Catalogue of Life, not NCBI. Anyway, NCBI may also throw 429 errors. Do you have an NCBI ENTREZ API key set with the env var ENTREZ_KEY
?
@sckott,
yes, I'm definitely using NCBI taxon IDs.
No, I did not set an ENTREZ_KEY
but I think this might solve the problem. I have already obtained an NCBI API key but how to I set it correctly?
Many thanks in advance for your help!
@sckott,
I just set the key using Sys.setenv(ENTREZ_KEY = "my.api.key")
as you outlined at https://github.com/ropensci/taxa/issues/135#issuecomment-370862861.
It seems to partially solve my issue as the download stalled at 7% throwing the error:
Error: Bad Request (HTTP 400)
.
Any idea how to address this?
It appears that downloading the classifications for such a long list of taxon IDs from NCBI is very fragile. Setting my NCBI API key and re-running my script as outlined above, the download now stalled at 25% throwing the error: Bad Gateway (HTTP 502)
.
It eventually worked to download the classifications of the full 17k list of NCBI taxon IDs.
NCBI's infrastructure is not very good, so I'm not surprised that you are running into errors with a lot of names.
Another option is taxizedb
- idea is the same as taxize, but using SQL dumps on your local machine.
I have been running into an issue for some time now trying to parse my data with lookup_tax_data. I have around 4k of tax_id's and I want to visualize them together with their fraction total reads within a heat tree.
this is what I run:
Sys.setenv(ENTREZ_KEY = "my key") data15 <- read.delim("path to my file") taxed_15 <- lookup_tax_data( data15, "taxon_id", column = 2, datasets = list("fraction_total_reads"), mappings = c("value), database = "ncbi", include_tax_data = TRUE, use_database_ids = TRUE, ask = TRUE )
I either get the following errors: Error: Bad Request (HTTP 400) or: Error in get_sort_var(tax_data, names(sort_var)) : No column named ""."
the last error does not show up if i leave out "datasets" and " mapping"
I hope there is a way to solve the problems i am facing.
Is this still not solved? I have the same problem with a list of about 600 species.
Are these errors random, or the same every time? If the latter, can you give me a command to test that causes this error?
I had the same error, and what did the trick to me, is to include the query in a 'try-error' function, and if the Error: Bad Request (HTTP 400) message appeared, than I used the Sys.sleep() and retried the query. In a loop, looks like:
for (i in 1: nrow(data)) {
classes_i <- try(tax_name(sci = data$taxon[i], get = c("genus","family","order","class"), db = "ncbi")) if (class(classes_i)=="try-error") { Sys.sleep(10) classes_i <- try(tax_name(sci = data$taxon[i], get = c("genus","family","order","class"), db = "ncbi"))} classes_both <- rbind(classes_both, classes_i) }
Thanks morellek, the loop worked for me. Was getting frustrated that even after getting the ncbi api key and using Sys.sleep in my similar loop I still got the Error: Bad Request (HTTP 400)
message. I still get some rows filled with the error messag, but that I can fix.
PS: classes_both = NULL
before the loop is missing
Hi there,
I'm trying to create a
taxmap
from a long list of NCBI taxon IDs for subsequent filtering.I have downloaded about 17k taxa containing a specific protein domain from InterPro and imported into R
I then try to set um my
taxmap
as follows:Unfortunately, this throws the error
Error: Too Many Requests (HTTP 429)
I guess the API client is making too many concurrent requests to the database which causes the error.
Could you please help to fix it?
Many thanks in advance!
The output of
sessionInfo()
is