Open arendsee opened 6 years ago
i get the same thing, will look
@arendsee
so this is the http request made
wonder if ther's anything in that request that strikes you as off, sometyhing we could change that would bring it in line with what taxizedb
gives
also @zachary-foster or @dwinter maybe you have a sense for why results are different from ENTREZ API vs. a dump of their database?
one think that I wonder about is the version of the database that ENTREZ is using could differ from what any user has on their disk if using taxizedb
- one thing to note in docs somewhere at least
Hmm... not something I know much about but I don't think it's an issue of versions. The browser is 'live', the FTP dumps are updated hourly and the eUtils databse is updated daily.
I would guess there is some trick in what exactly what to sent to elink. Using esearch
instead of elink
there is the special term NXLV
for immediate descendants. This gets most of the ones missing from taxize:
library(rentrez)
one_down <- entrez_search(db="taxonomy", term="Bacteria[NXLV]", use_history=TRUE)
summs <- entrez_summary(db="taxonomy", web_history=one_down$web_history)
t(extract_from_esummary(summs, c("scientificname", "rank", "taxid")))
scientificname rank taxid
1936987 "Balneolaeota" "phylum" 1936987
1930617 "Calditrichaeota" "phylum" 1930617
1853220 "Rhodothermaeota" "phylum" 1853220
1802340 "Nitrospinae/Tectomicrobia group" "" 1802340
1783272 "Terrabacteria group" "" 1783272
1783270 "FCB group" "" 1783270
1783257 "PVC group" "" 1783257
629425 "Bacteria ferula" "species" 629425
629405 "Bacteria bahiensis" "species" 629405
629404 "Bacteria baculus" "species" 629404
629403 "Bacteria apolinari" "species" 629403
629401 "Bacteria ambigua" "species" 629401
629398 "Bacteria acuminatocercata" "species" 629398
629397 "Bacteria aborigena" "species" 629397
629396 "Bacteria abnormis" "species" 629396
508458 "Synergistetes" "phylum" 508458
203691 "Spirochaetes" "phylum" 203691
200940 "Thermodesulfobacteria" "phylum" 200940
200938 "Chrysiogenetes" "phylum" 200938
200930 "Deferribacteres" "phylum" 200930
200918 "Thermotogae" "phylum" 200918
200783 "Aquificae" "phylum" 200783
74152 "Elusimicrobia" "phylum" 74152
68297 "Dictyoglomi" "phylum" 68297
67814 "Caldiserica" "phylum" 67814
57723 "Acidobacteria" "phylum" 57723
48479 "environmental samples" "" 48479
40117 "Nitrospirae" "phylum" 40117
32066 "Fusobacteria" "phylum" 32066
2323 "unclassified Bacteria" "" 2323
1224 "Proteobacteria" "phylum" 1224
Not sure how helpful this is for the specific question, but it at least shows these taxa are accessible via eUtils.... :confused:
@sckott Hmm, nothing about the request seems off to me. Some of the missing phyla are fairly new, see https://www.ncbi.nlm.nih.gov/pubmed/27287844. I wonder if there is some something screwy on the Entrez side? Stale cached values for children ("Next Level"), perhaps?
I am not sure either. Perhaps the term=Bacteria[Next Level]
is filtering out some things that are associated with taxon ID 2, but not with "Bacteria" for some reason. Ideally, the term
argument would not be needed, since we just want to child IDs for ID 2, regardless of the "term", but we never we able to get ENTREZ to do that.
By the way, the title of this issue sounds like an interesting science fiction novel.
thanks @dwinter @arendsee @zachary-foster
@dwinter your approach might work, though i'm not sure how we'd programmatically filter out to get only the direct children. i guess we can consult our iternal data.frame of ranks and their orders and only pick the direct descendant rank from the one queried? thoughts folks?
Session Info
```r Session info ------------------------------------------------------------------ setting value version R version 3.4.3 (2017-11-30) system x86_64, linux-gnu ui X11 language (EN) collate en_US.UTF-8 tz America/Chicago date 2018-02-03 Packages ---------------------------------------------------------------------- package * version date source ape 5.0 2017-10-30 cran (@5.0) assertthat 0.2.0 2017-04-11 CRAN (R 3.4.1) base * 3.4.3 2017-11-30 local bindr 0.1 2016-11-13 CRAN (R 3.4.1) bindrcpp * 0.2 2017-06-17 CRAN (R 3.4.1) bit 1.1-12 2014-04-09 CRAN (R 3.4.1) bit64 0.9-7 2017-05-08 CRAN (R 3.4.1) blob 1.1.0 2017-06-17 CRAN (R 3.4.1) bold 0.5.0 2017-07-21 CRAN (R 3.4.2) cli 1.0.0 2017-11-05 CRAN (R 3.4.3) codetools 0.2-15 2016-10-05 CRAN (R 3.4.1) colorout * 1.1-2 2017-09-23 Github (jalvesaq/colorout@020a14d) commonmark 1.4 2017-09-01 CRAN (R 3.4.1) compiler 3.4.3 2017-11-30 local crayon 1.3.4 2017-09-16 CRAN (R 3.4.1) crul 0.5.0 2018-01-22 cran (@0.5.0) curl 3.1 2017-12-12 cran (@3.1) data.table 1.10.4-3 2017-10-27 cran (@1.10.4-) datasets * 3.4.3 2017-11-30 local DBI 0.7 2017-06-18 CRAN (R 3.4.1) dbplyr 1.2.0 2018-01-03 cran (@1.2.0) devtools * 1.13.4 2017-11-09 CRAN (R 3.4.2) digest 0.6.13 2017-12-14 CRAN (R 3.4.3) dplyr * 0.7.4 2017-09-28 cran (@0.7.4) foreach 1.4.4 2017-12-12 CRAN (R 3.4.3) glue 1.2.0 2017-10-29 cran (@1.2.0) graphics * 3.4.3 2017-11-30 local grDevices * 3.4.3 2017-11-30 local grid 3.4.3 2017-11-30 local hms 0.4.0 2017-11-23 CRAN (R 3.4.2) hoardr 0.2.0 2017-05-10 CRAN (R 3.4.2) httr 1.3.1 2017-08-20 CRAN (R 3.4.1) iterators 1.0.9 2017-12-12 CRAN (R 3.4.3) jsonlite 1.5 2017-06-01 CRAN (R 3.4.1) lattice 0.20-35 2017-03-25 CRAN (R 3.4.3) magrittr * 1.5 2014-11-22 CRAN (R 3.4.1) memoise 1.1.0 2017-04-21 CRAN (R 3.4.1) methods * 3.4.3 2017-11-30 local nlme 3.1-131 2017-02-06 CRAN (R 3.4.3) parallel 3.4.3 2017-11-30 local pillar 1.1.0 2018-01-14 cran (@1.1.0) pkgconfig 2.0.1 2017-03-21 CRAN (R 3.4.1) plyr 1.8.4 2016-06-08 CRAN (R 3.4.1) pryr * 0.1.3 2017-10-30 cran (@0.1.3) purrr 0.2.4 2017-10-18 CRAN (R 3.4.2) R6 2.2.2 2017-06-17 CRAN (R 3.4.1) rappdirs 0.3.1 2016-03-28 CRAN (R 3.4.2) Rcpp 0.12.15 2018-01-20 cran (@0.12.15) readr 1.1.1 2017-05-16 CRAN (R 3.4.1) reshape 0.8.7 2017-08-06 CRAN (R 3.4.2) reshape2 1.4.3 2017-12-11 cran (@1.4.3) rlang 0.1.6 2017-12-21 cran (@0.1.6) RMySQL 0.10.13 2017-08-14 CRAN (R 3.4.2) roxygen2 6.0.1 2017-02-06 CRAN (R 3.4.2) RPostgreSQL 0.6-2 2017-06-24 CRAN (R 3.4.2) RSQLite 2.0 2017-06-19 CRAN (R 3.4.2) stats * 3.4.3 2017-11-30 local stringi 1.1.6 2017-11-17 CRAN (R 3.4.2) stringr 1.2.0 2017-02-18 CRAN (R 3.4.1) taxize * 0.9.1.9321 2018-02-03 Github (ropensci/taxize@319e03d) taxizedb * 0.1.6The
dev
version oftaxize
produces the following:This is missing several taxa retrieved from
taxizedb
:Which also matches the taxa on NCBI taxonomy