ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
264 stars 58 forks source link

classification() unable to find NCBI accession number known to exist in database #925

Closed aubreyodom closed 5 months ago

aubreyodom commented 6 months ago
Session Info ```r Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.3.1 (2023-06-16) os AlmaLinux 8.9 (Midnight Oncilla) system x86_64, linux-gnu ui RStudio language (EN) collate en_US.UTF-8 ctype en_US.UTF-8 tz America/New_York date 2024-01-04 rstudio 2023.06.1+524 Mountain Hydrangea (server) pandoc 3.1.1 @ /usr/local/ood/rstudio-server-2023.06.1-524/bin/quarto/bin/tools/ (via rmarkdown) ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────── package * version date (UTC) lib source ape 5.7-1 2023-03-13 [2] CRAN (R 4.3.1) bold 1.3.0 2023-05-02 [1] CRAN (R 4.3.1) cachem 1.0.8 2023-05-01 [2] CRAN (R 4.3.1) callr 3.7.3 2022-11-02 [2] CRAN (R 4.3.1) cli 3.6.1 2023-03-23 [2] CRAN (R 4.3.1) codetools 0.2-19 2023-02-01 [2] CRAN (R 4.3.1) conditionz 0.1.0 2019-04-24 [1] CRAN (R 4.3.1) crayon 1.5.2 2022-09-29 [2] CRAN (R 4.3.1) crul 1.4.0 2023-05-17 [2] CRAN (R 4.3.1) curl 5.1.0 2023-10-02 [1] CRAN (R 4.3.1) data.table 1.14.8 2023-02-17 [2] CRAN (R 4.3.1) devtools 2.4.5 2022-10-11 [1] CRAN (R 4.3.1) digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.1) ellipsis 0.3.2 2021-04-29 [2] CRAN (R 4.3.1) evaluate 0.23 2023-11-01 [1] CRAN (R 4.3.1) fastmap 1.1.1 2023-02-24 [2] CRAN (R 4.3.1) foreach 1.5.2 2022-02-02 [2] CRAN (R 4.3.1) fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.1) glue 1.6.2 2022-02-24 [2] CRAN (R 4.3.1) htmltools 0.5.7 2023-11-03 [1] CRAN (R 4.3.1) htmlwidgets 1.6.2 2023-03-17 [2] CRAN (R 4.3.1) httpcode 0.3.0 2020-04-10 [2] CRAN (R 4.3.1) httpuv 1.6.11 2023-05-11 [2] CRAN (R 4.3.1) iterators 1.0.14 2022-02-05 [2] CRAN (R 4.3.1) jsonlite 1.8.7 2023-06-29 [1] CRAN (R 4.3.1) knitr 1.45 2023-10-30 [1] CRAN (R 4.3.1) later 1.3.1 2023-05-02 [2] CRAN (R 4.3.1) lattice 0.21-8 2023-04-05 [2] CRAN (R 4.3.1) lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.3.1) magrittr 2.0.3 2022-03-30 [2] CRAN (R 4.3.1) memoise 2.0.1 2021-11-26 [2] CRAN (R 4.3.1) mime 0.12 2021-09-28 [2] CRAN (R 4.3.1) miniUI 0.1.1.1 2018-05-18 [2] CRAN (R 4.3.1) nlme 3.1-162 2023-01-31 [2] CRAN (R 4.3.1) pkgbuild 1.4.1 2023-06-14 [2] CRAN (R 4.3.1) pkgload 1.3.2 2022-11-16 [2] CRAN (R 4.3.1) prettyunits 1.2.0 2023-09-24 [1] CRAN (R 4.3.1) processx 3.8.1 2023-04-18 [2] CRAN (R 4.3.1) profvis 0.3.8 2023-05-02 [2] CRAN (R 4.3.1) promises 1.2.0.1 2021-02-11 [2] CRAN (R 4.3.1) ps 1.7.5 2023-04-18 [2] CRAN (R 4.3.1) purrr 1.0.2 2023-08-10 [1] CRAN (R 4.3.1) R6 2.5.1 2021-08-19 [2] CRAN (R 4.3.1) Rcpp 1.0.11 2023-07-06 [1] CRAN (R 4.3.1) remotes 2.4.2 2021-11-30 [2] CRAN (R 4.3.1) rlang 1.1.2 2023-11-04 [1] CRAN (R 4.3.1) rmarkdown 2.25 2023-09-18 [1] CRAN (R 4.3.1) rstudioapi 0.15.0 2023-07-07 [1] CRAN (R 4.3.1) sessioninfo 1.2.2 2021-12-06 [2] CRAN (R 4.3.1) shiny 1.7.4 2022-12-15 [2] CRAN (R 4.3.1) stringi 1.8.1 2023-11-13 [1] CRAN (R 4.3.1) stringr 1.5.1 2023-11-14 [1] CRAN (R 4.3.1) taxize 0.9.100 2022-04-22 [1] CRAN (R 4.3.1) triebeard 0.4.1 2023-03-04 [2] CRAN (R 4.3.1) urlchecker 1.0.1 2021-11-30 [2] CRAN (R 4.3.1) urltools 1.7.3 2019-04-14 [2] CRAN (R 4.3.1) usethis 2.2.0 2023-06-06 [2] CRAN (R 4.3.1) uuid 1.1-1 2023-08-17 [1] CRAN (R 4.3.1) vctrs 0.6.4 2023-10-12 [1] CRAN (R 4.3.1) xfun 0.41 2023-11-01 [1] CRAN (R 4.3.1) xml2 1.3.5 2023-07-06 [1] CRAN (R 4.3.1) xtable 1.8-6 2020-06-19 [2] R-Forge (R 4.3.1) zoo 1.8-13 2023-06-05 [2] R-Forge (R 4.3.1) [1] /usr4/spclpgm/aodom/R/x86_64-pc-linux-gnu-library/4.3 [2] /share/pkg.8/r/4.3.1/install/lib64/R/library ```

Hi there,

I'm finding that the classification() function is unable to pull information for various taxa from the NCBI nucleotide database as of late. As a toy example, take accession "NZ_LQBM01000006.1" (or "NZ_LQBM01000006"). This is present in the NCBI database as "Nesterenkonia jeotgali strain CD08_7 CD08_7_contig_6_consensus, whole genome shotgun sequence" but fails to return output with classification().

taxize::classification("NZ_LQBM01000006.1", db = "ncbi",
                       key = NULL,
                       max_tries = 3)

Which returns the output

══  1 queries  ═══════════════

Retrieving data for taxon 'NZ_LQBM01000006.1'

Not found. Consider checking the spelling or alternate classification
══  Results  ═════════════════

• Total: 1 
• Found: 0 
• Not Found: 0
No ENTREZ API key provided
 Get one via taxize::use_entrez()
See https://ncbiinsights.ncbi.nlm.nih.gov/2017/11/02/new-api-keys-for-the-e-utilities/
$NZ_LQBM01000006.1
[1] NA

attr(,"class")
[1] "classification"
attr(,"db")
[1] "ncbi"
Warning message:
Giving up on query after 3 tries. NAs will be returned. 

Any help or insight would be greatly appreciated. Thanks!

zachary-foster commented 6 months ago

The classification function takes taxon IDs or taxon names as input, so you need to convert your sequence accession number to a taxon ID like so:

taxize::classification(taxize::genbank2uid("NZ_LQBM01000006.1"), db = "ncbi")
aubreyodom commented 5 months ago

That resolves my issue - thank you!