ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
264 stars 58 forks source link

Error getting all children of M tuberculosis complex #850

Closed pgcudahy closed 3 years ago

pgcudahy commented 3 years ago
Session info ```r ───────── setting value version R version 3.6.3 (2020-02-29) os Ubuntu 18.04.5 LTS system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/New_York date 2020-10-06 ─ Packages ───────── package * version date lib source ape 5.4-1 2020-08-13 [1] CRAN (R 3.6.3) assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.3) backports 1.1.9 2020-08-24 [1] CRAN (R 3.6.3) base64enc 0.1-3 2015-07-28 [1] CRAN (R 3.6.3) blob 1.2.1 2020-01-20 [1] CRAN (R 3.6.3) bold 1.1.0 2020-06-17 [1] CRAN (R 3.6.3) broom 0.7.0.9001 2020-08-26 [1] Github (tidymodels/broom@7b50032) callr 3.4.3 2020-03-28 [1] CRAN (R 3.6.3) cellranger 1.1.0 2016-07-27 [1] CRAN (R 3.6.3) cli 2.0.2 2020-02-28 [1] CRAN (R 3.6.3) codetools 0.2-16 2018-12-24 [1] CRAN (R 3.6.3) colorspace 1.4-1 2019-03-18 [1] CRAN (R 3.6.3) conditionz 0.1.0 2019-04-24 [1] CRAN (R 3.6.3) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.3) crul 1.0.0 2020-07-30 [1] CRAN (R 3.6.3) curl 4.3 2019-12-02 [1] CRAN (R 3.6.3) data.table 1.13.0 2020-07-24 [1] CRAN (R 3.6.3) DBI 1.1.0 2019-12-15 [1] CRAN (R 3.6.3) dbplyr 1.4.4 2020-05-27 [1] CRAN (R 3.6.3) desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.3) devtools 2.3.1 2020-07-21 [1] CRAN (R 3.6.3) digest 0.6.25 2020-02-23 [1] CRAN (R 3.6.3) dplyr * 1.0.2 2020-08-18 [1] CRAN (R 3.6.3) ellipsis 0.3.1 2020-05-15 [1] CRAN (R 3.6.3) evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.3) fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.3) forcats * 0.5.0 2020-03-01 [1] CRAN (R 3.6.3) foreach 1.5.0 2020-03-30 [1] CRAN (R 3.6.3) fs 1.5.0 2020-07-31 [1] CRAN (R 3.6.3) generics 0.0.2 2018-11-29 [1] CRAN (R 3.6.3) getPass 0.2-2 2017-07-21 [1] CRAN (R 3.6.3) ggplot2 * 3.3.2 2020-06-19 [1] CRAN (R 3.6.3) glue 1.4.1 2020-05-13 [1] CRAN (R 3.6.3) gtable 0.3.0 2019-03-25 [1] CRAN (R 3.6.3) haven 2.3.1 2020-06-01 [1] CRAN (R 3.6.3) hms 0.5.3 2020-01-08 [1] CRAN (R 3.6.3) htmltools 0.5.0 2020-06-16 [1] CRAN (R 3.6.3) httpcode 0.3.0 2020-04-10 [1] CRAN (R 3.6.3) httr 1.4.2 2020-07-20 [1] CRAN (R 3.6.3) IRdisplay 0.7.0 2018-11-29 [1] CRAN (R 3.6.3) IRkernel 1.1.1 2020-07-20 [1] CRAN (R 3.6.3) iterators 1.0.12 2019-07-26 [1] CRAN (R 3.6.3) jsonlite 1.7.0 2020-06-25 [1] CRAN (R 3.6.3) lattice 0.20-41 2020-04-02 [1] CRAN (R 3.6.3) lifecycle 0.2.0 2020-03-06 [1] CRAN (R 3.6.3) lubridate 1.7.9 2020-06-08 [1] CRAN (R 3.6.3) magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.3) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.3) modelr 0.1.8 2020-05-19 [1] CRAN (R 3.6.3) munsell 0.5.0 2018-06-12 [1] CRAN (R 3.6.3) nlme 3.1-148 2020-05-24 [1] CRAN (R 3.6.3) pbdZMQ 0.3-3 2018-05-05 [1] CRAN (R 3.6.3) pillar 1.4.6 2020-07-10 [1] CRAN (R 3.6.3) pkgbuild 1.1.0 2020-07-13 [1] CRAN (R 3.6.3) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 3.6.3) pkgload 1.1.0 2020-05-29 [1] CRAN (R 3.6.3) plyr 1.8.6 2020-03-03 [1] CRAN (R 3.6.3) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.3) processx 3.4.3 2020-07-05 [1] CRAN (R 3.6.3) ps 1.3.4 2020-08-11 [1] CRAN (R 3.6.3) purrr * 0.3.4 2020-04-17 [1] CRAN (R 3.6.3) R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.3) Rcpp 1.0.5 2020-07-06 [1] CRAN (R 3.6.3) readr * 1.3.1 2018-12-21 [1] CRAN (R 3.6.3) readxl 1.3.1 2019-03-13 [1] CRAN (R 3.6.3) remotes 2.2.0 2020-07-21 [1] CRAN (R 3.6.3) repr 1.1.0 2020-01-28 [1] CRAN (R 3.6.3) reprex 0.3.0 2019-05-16 [1] CRAN (R 3.6.3) reshape 0.8.8 2018-10-23 [1] CRAN (R 3.6.3) rlang 0.4.7 2020-07-09 [1] CRAN (R 3.6.3) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.3) rstudioapi 0.11 2020-02-07 [1] CRAN (R 3.6.3) rvest 0.3.6 2020-07-25 [1] CRAN (R 3.6.3) scales 1.1.1 2020-05-11 [1] CRAN (R 3.6.3) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.3) stringi 1.4.6 2020-02-17 [1] CRAN (R 3.6.3) stringr * 1.4.0 2019-02-10 [1] CRAN (R 3.6.3) taxize * 0.9.98 2020-09-18 [1] CRAN (R 3.6.3) testthat 2.3.2 2020-03-02 [1] CRAN (R 3.6.3) tibble * 3.0.3 2020-07-10 [1] CRAN (R 3.6.3) tidyr * 1.1.1 2020-07-31 [1] CRAN (R 3.6.3) tidyselect 1.1.0 2020-05-11 [1] CRAN (R 3.6.3) tidyverse * 1.3.0 2019-11-21 [1] CRAN (R 3.6.3) triebeard 0.3.0 2016-08-04 [1] CRAN (R 3.6.3) urltools 1.7.3 2019-04-14 [1] CRAN (R 3.6.3) usethis 1.6.1 2020-04-29 [1] CRAN (R 3.6.3) uuid 0.1-4 2020-02-26 [1] CRAN (R 3.6.3) vctrs 0.3.2 2020-07-15 [1] CRAN (R 3.6.3) withr 2.2.0 2020-04-20 [1] CRAN (R 3.6.3) xml2 1.3.2 2020-04-23 [1] CRAN (R 3.6.3) zoo 1.8-8 2020-05-02 [1] CRAN (R 3.6.3) [1] /usr/local/lib/R/site-library [2] /usr/lib/R/site-library [3] /usr/lib/R/library ```

I'm trying to get all of the children taxon ids of Mycobacterium tuberculosis complex (uid = 77643) but the downstream command fails with

downstream(as.uid(77643), db = 'ncbi', downto = 'no rank', intermediate = TRUE)

Number of ids long; we're splitting up into chunks for multiple HTTP requests
Error in intermed[[iter]]: subscript out of bounds
Traceback:

1. downstream(as.uid(77643), db = "ncbi", downto = "no rank", intermediate = TRUE)
2. downstream.uid(as.uid(77643), db = "ncbi", downto = "no rank", 
 .     intermediate = TRUE)
3. lapply(sci_id, fun, downto = downto, intermediate = intermediate, 
 .     ...)
4. FUN(X[[i]], ...)
5. ncbi_downstream(id = y, downto = downto, intermediate = intermediate, 
 .     ...)

Any idea where I'm going wrong?

sckott commented 3 years ago

thanks for the issue and for including your session info. i'll have a look

sckott commented 3 years ago

there was a bug in there that I fixed, but the main issue is that "no rank" is not a rank that can be used. It can be anywhere in the rank hierarchy, so can not be used in the context of downstream().

for your case perhaps "strain" would work?

z <- downstream(as.uid(77643), db = 'ncbi', downto = 'strain')
head(z[[1]])
  childtaxa_id                        childtaxa_name   rank
1      1305739        Mycobacterium orygis 112400015 strain
2      1246634 Mycobacterium canettii CIPT 140070007 strain
3      1246121 Mycobacterium canettii CIPT 140070013 strain
4      1246120 Mycobacterium canettii CIPT 140070005 strain
5      1246119 Mycobacterium canettii CIPT 140070002 strain
6      1205677 Mycobacterium canettii CIPT 140070017 strain