ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
264 stars 58 forks source link

Misplaced elements in output of taxize::classification for a vector containing a mix of valid/invalid NCBI taxids #887

Closed sdgamboa closed 2 years ago

sdgamboa commented 2 years ago

Hello, I'd like to report a bug that I found in the taxize::classification function when using it to get information for a vector containing a mix of valid and invalid IDs (NCBI database). In this scenario, the elements and names of the output list are misplaced. The function assigns the next result to invalid NCBIs instead of returning NA as is the case of the taxizedb::classification function (values for the elements 3-7 in the example code below).

Thank you for so useful package!

Samuel

Reprex:

taxize::classification(1:7, db = "ncbi")
#> No ENTREZ API key provided
#>  Get one via taxize::use_entrez()
#> See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
#> Warning in out[!is.na(x)] <- query_ncbi(x[!is.na(x)]): number of items to
#> replace is not a multiple of replacement length
#> $`1`
#>   name    rank id
#> 1 root no rank  1
#> 
#> $`2`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2           Bacteria superkingdom      2
#> 
#> $`3`
#>                  name         rank     id
#> 1  cellular organisms      no rank 131567
#> 2            Bacteria superkingdom      2
#> 3      Proteobacteria       phylum   1224
#> 4 Alphaproteobacteria        class  28211
#> 5    Hyphomicrobiales        order    356
#> 6   Xanthobacteraceae       family 335928
#> 7        Azorhizobium        genus      6
#> 
#> $`4`
#>                       name         rank     id
#> 1       cellular organisms      no rank 131567
#> 2                 Bacteria superkingdom      2
#> 3           Proteobacteria       phylum   1224
#> 4      Alphaproteobacteria        class  28211
#> 5         Hyphomicrobiales        order    356
#> 6        Xanthobacteraceae       family 335928
#> 7             Azorhizobium        genus      6
#> 8 Azorhizobium caulinodans      species      7
#> 
#> $`5`
#>   name    rank id
#> 1 root no rank  1
#> 
#> $`6`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2           Bacteria superkingdom      2
#> 
#> $`7`
#>                  name         rank     id
#> 1  cellular organisms      no rank 131567
#> 2            Bacteria superkingdom      2
#> 3      Proteobacteria       phylum   1224
#> 4 Alphaproteobacteria        class  28211
#> 5    Hyphomicrobiales        order    356
#> 6   Xanthobacteraceae       family 335928
#> 7        Azorhizobium        genus      6
#> 
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"
taxizedb::classification(1:7, db = "ncbi")
#> $`1`
#> [1] NA
#> 
#> $`2`
#>                 name         rank     id
#> 1 cellular organisms      no rank 131567
#> 2           Bacteria superkingdom      2
#> 
#> $`3`
#> [1] NA
#> 
#> $`4`
#> [1] NA
#> 
#> $`5`
#> [1] NA
#> 
#> $`6`
#>                  name         rank     id
#> 1  cellular organisms      no rank 131567
#> 2            Bacteria superkingdom      2
#> 3      Proteobacteria       phylum   1224
#> 4 Alphaproteobacteria        class  28211
#> 5    Hyphomicrobiales        order    356
#> 6   Xanthobacteraceae       family 335928
#> 7        Azorhizobium        genus      6
#> 
#> $`7`
#>                       name         rank     id
#> 1       cellular organisms      no rank 131567
#> 2                 Bacteria superkingdom      2
#> 3           Proteobacteria       phylum   1224
#> 4      Alphaproteobacteria        class  28211
#> 5         Hyphomicrobiales        order    356
#> 6        Xanthobacteraceae       family 335928
#> 7             Azorhizobium        genus      6
#> 8 Azorhizobium caulinodans      species      7
#> 
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"
packageVersion("taxize")
#> [1] '0.9.99.947'
packageVersion("taxizedb")
#> [1] '0.3.0'

Created on 2021-12-24 by the reprex package (v2.0.1)

Session Info ```r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.1 (2021-08-10) #> os Ubuntu 20.04.3 LTS #> system x86_64, linux-gnu #> ui X11 #> language en_US:en #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Merida #> date 2021-12-24 #> pandoc 2.14.0.3 @ /usr/lib/rstudio/bin/pandoc/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> ape 5.6 2021-12-21 [1] CRAN (R 4.1.1) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.4.1 2021-12-13 [1] CRAN (R 4.1.1) #> bit 4.0.4 2020-08-04 [1] CRAN (R 4.1.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.1.0) #> blob 1.2.2 2021-07-23 [1] CRAN (R 4.1.0) #> bold 1.2.0 2021-05-11 [1] CRAN (R 4.1.0) #> cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.0) #> callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.0) #> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.0) #> codetools 0.2-18 2020-11-04 [2] CRAN (R 4.1.1) #> conditionz 0.1.0 2019-04-24 [1] CRAN (R 4.1.0) #> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1) #> crul 1.2.0 2021-11-22 [1] CRAN (R 4.1.1) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0) #> data.table 1.14.2 2021-09-27 [1] CRAN (R 4.1.0) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.1.1) #> dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.1.0) #> desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.0) #> devtools 2.4.3 2021-11-30 [1] CRAN (R 4.1.1) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.1.1) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> foreach 1.5.1 2020-10-15 [1] CRAN (R 4.1.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.1.1) #> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.0) #> glue 1.6.0 2021-12-17 [1] CRAN (R 4.1.1) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> hoardr 0.5.2 2018-12-02 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.0) #> httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.1.0) #> iterators 1.0.13 2020-10-15 [1] CRAN (R 4.1.0) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.1) #> lattice 0.20-45 2021-09-22 [2] CRAN (R 4.1.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.0) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> memoise 2.0.1 2021-11-26 [1] CRAN (R 4.1.1) #> nlme 3.1-153 2021-09-07 [2] CRAN (R 4.1.1) #> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) #> pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.1.1) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> pkgload 1.2.4 2021-11-30 [1] CRAN (R 4.1.1) #> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.0) #> processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.0) #> ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.1.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.1.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.1.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.1.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.0) #> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.1.0) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0) #> remotes 2.4.2 2021-11-30 [1] CRAN (R 4.1.1) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.1.0) #> reshape 0.8.8 2018-10-23 [1] CRAN (R 4.1.0) #> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.0) #> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.0) #> RSQLite 2.2.9 2021-12-06 [1] CRAN (R 4.1.1) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.1.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.1.1) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.6.2 2021-09-23 [1] CRAN (R 4.1.0) #> taxize 0.9.99.947 2021-12-24 [1] Github (ropensci/taxize@a4db9a7) #> taxizedb 0.3.0 2021-01-15 [1] CRAN (R 4.1.0) #> testthat 3.1.1 2021-12-03 [1] CRAN (R 4.1.1) #> tibble 3.1.6 2021-11-07 [1] CRAN (R 4.1.1) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) #> triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.1.0) #> urltools 1.7.3 2019-04-14 [1] CRAN (R 4.1.0) #> usethis 2.1.5 2021-12-09 [1] CRAN (R 4.1.1) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> uuid 1.0-3 2021-11-01 [1] CRAN (R 4.1.1) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.1) #> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.1) #> xml2 1.3.3 2021-11-30 [1] CRAN (R 4.1.1) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> zoo 1.8-9 2021-03-09 [1] CRAN (R 4.1.0) #> #> [1] /home/usuario/R/x86_64-pc-linux-gnu-library/4.1 #> [2] /home/usuario/Apps/R-4.1.1/library #> #> ──────────────────────────────── ```
zachary-foster commented 2 years ago

Thanks for the report! I am looking into it

zachary-foster commented 2 years ago

Hello @sdgamboa,

I have fixed this issue on the current development branch for the next release. You can install it with the following code.

devtools::install_github('ropensci/taxize@1.0-rc')

However this is a major release and it might break other code.

library(taxize)
taxize::classification(1:7, db = "ncbi")
#> $`1`
#> # A tibble: 1 × 3
#>   name  rank    id   
#>   <chr> <chr>   <chr>
#> 1 root  no rank 1    
#> 
#> $`2`
#> # A tibble: 2 × 3
#>   name               rank         id    
#>   <chr>              <chr>        <chr> 
#> 1 cellular organisms no rank      131567
#> 2 Bacteria           superkingdom 2     
#> 
#> $`3`
#> NULL
#> 
#> $`4`
#> NULL
#> 
#> $`5`
#> NULL
#> 
#> $`6`
#> # A tibble: 7 × 3
#>   name                rank         id    
#>   <chr>               <chr>        <chr> 
#> 1 cellular organisms  no rank      131567
#> 2 Bacteria            superkingdom 2     
#> 3 Proteobacteria      phylum       1224  
#> 4 Alphaproteobacteria class        28211 
#> 5 Hyphomicrobiales    order        356   
#> 6 Xanthobacteraceae   family       335928
#> 7 Azorhizobium        genus        6     
#> 
#> $`7`
#> # A tibble: 8 × 3
#>   name                     rank         id    
#>   <chr>                    <chr>        <chr> 
#> 1 cellular organisms       no rank      131567
#> 2 Bacteria                 superkingdom 2     
#> 3 Proteobacteria           phylum       1224  
#> 4 Alphaproteobacteria      class        28211 
#> 5 Hyphomicrobiales         order        356   
#> 6 Xanthobacteraceae        family       335928
#> 7 Azorhizobium             genus        6     
#> 8 Azorhizobium caulinodans species      7     
#> 
#> attr(,"class")
#> [1] "classification"
#> attr(,"db")
#> [1] "ncbi"

Created on 2022-01-17 by the reprex package (v2.0.1)

sdgamboa commented 2 years ago

@zachary-foster, thank you!