ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
269 stars 61 forks source link

Taxize always produce an API error after running for 1-2h+ #907

Open GossypiumH opened 1 year ago

GossypiumH commented 1 year ago

Hi,

I have an issue with taxize. I am trying to retrieve the full taxonomy (from Kingdom to Order) of a dataset with 10k+ bacterias (10182 to be exact).

I have in input a dataframe with only one column with the species names (ex: Xenorhabdus sp.) so my script is very simple, as follow :

library(taxize)
library(dplyr)
library(tidyr)

taxa = read.csv(file="/home/jbf/MEGA/KBS-MSU/G820/Abundance_matrixes/test_taxize.txt", sep="\t", header=T, check.names=F)

taxize_options(ncbi_sleep = 1.5)

test = dplyr::tbl_df(cbind(classification(taxa$specie_ID, db="ncbi", rows=1, verbose=TRUE, batch_size=5)))

I tried to play with the value of "taxize_options(ncbi_sleep = 1.5)" but apparently it doesn't change the fact that I always have an API error as follow :

Retrieving data for taxon 'Janthinobacterium sp.'

Error: {"error":"error forwarding request","api-key":"192.108.190.140","type":"ip", "status":"ok"}

It happens at random after 1 or 2 hours of NCBI requests. I would very much like to have an idea of what is going on and if I did something wrong.

Thank you in advance,

zachary-foster commented 1 year ago

Does the error always happen on the same taxon, or is it somewhat random? For such a large query, I recommend using taxizedb, which supports offline queries of downloaded databases.

GossypiumH commented 1 year ago

The error is totally random. It can happen after 30 minutes of running or after 2 hours, I never passed the two hours cap though, it always bugs before.

My problem is that I can't use taxizedb because it only works with an input that is taxon IDs and I only have names.

zachary-foster commented 1 year ago

Would this work for your purposes?

library(taxizedb)
classification(name2taxid(c('Arabidopsis thaliana', 'pig')))
#> $`3702`
#>                    name         rank      id
#> 1    cellular organisms      no rank  131567
#> 2             Eukaryota superkingdom    2759
#> 3         Viridiplantae      kingdom   33090
#> 4          Streptophyta       phylum   35493
#> 5        Streptophytina    subphylum  131221
#> 6           Embryophyta        clade    3193
#> 7          Tracheophyta        clade   58023
#> 8         Euphyllophyta        clade   78536
#> 9         Spermatophyta        clade   58024
#> 10        Magnoliopsida        class    3398
#> 11      Mesangiospermae        clade 1437183
#> 12       eudicotyledons        clade   71240
#> 13           Gunneridae        clade   91827
#> 14         Pentapetalae        clade 1437201
#> 15               rosids        clade   71275
#> 16              malvids        clade   91836
#> 17          Brassicales        order    3699
#> 18         Brassicaceae       family    3700
#> 19           Camelineae        tribe  980083
#> 20          Arabidopsis        genus    3701
#> 21 Arabidopsis thaliana      species    3702
#> 
#> $`9823`
#>                    name         rank      id
#> 1    cellular organisms      no rank  131567
#> 2             Eukaryota superkingdom    2759
#> 3          Opisthokonta        clade   33154
#> 4               Metazoa      kingdom   33208
#> 5             Eumetazoa        clade    6072
#> 6             Bilateria        clade   33213
#> 7         Deuterostomia        clade   33511
#> 8              Chordata       phylum    7711
#> 9              Craniata    subphylum   89593
#> 10           Vertebrata        clade    7742
#> 11        Gnathostomata        clade    7776
#> 12           Teleostomi        clade  117570
#> 13         Euteleostomi        clade  117571
#> 14        Sarcopterygii   superclass    8287
#> 15 Dipnotetrapodomorpha        clade 1338369
#> 16            Tetrapoda        clade   32523
#> 17              Amniota        clade   32524
#> 18             Mammalia        class   40674
#> 19               Theria        clade   32525
#> 20             Eutheria        clade    9347
#> 21        Boreoeutheria        clade 1437010
#> 22       Laurasiatheria   superorder  314145
#> 23         Artiodactyla        order   91561
#> 24                Suina     suborder   35497
#> 25               Suidae       family    9821
#> 26                  Sus        genus    9822
#> 27           Sus scrofa      species    9823
#> 
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"

Created on 2023-01-12 with reprex v2.0.2

GossypiumH commented 1 year ago

Hum ! Tank you it should probably works !