ropensci / taxadb

:package: Taxonomic Database
https://docs.ropensci.org/taxadb
Other
43 stars 13 forks source link

IUCN database #88

Open ccheng91 opened 3 years ago

ccheng91 commented 3 years ago

Hello, I am trying to work on the IUCN database with taxadb but seems like the available version of IUCN database is 2019.

> td_create("iucn")
could not find 2020_dwc_iucn, 2020_common_iucn 
  checking for older versions.
2020_dwc_iucn not available2020_common_iucn not available
> td_create("iucn",version = 2019)
Importing C:/Users/Cheng/AppData/Roaming/R/data/R/contentid/data/d9/1b/d91b51013b669a31fd268743cf2db866b0f3e7a7f1af78e60271fa5f137bd21e in 100000 line chunks:
[-] chunk 2 ...Done! (in 10.51822 secs)
Importing C:/Users/Cheng/AppData/Roaming/R/data/R/contentid/data/30/51/30516362af0a394a7a78677ae129a95101cb852bda20ad872ae042ced43c463c in 100000 line chunks:
    ...Done! (in 0.0776782 secs)
Warning messages:
1: In overwrite_db(con, tablename) : overwriting 2019_dwc_iucn
2: In read_chunked(con, lines, encoding) :
  connection has already been completely read

The real issue is when I check the IUCN table, there are no terrestrial species in it. The acceptedNameUsageID starts with SLB:. Isn't that another database (SeaLifeBase)?

`> taxa_tbl("iucn",version=2019)
# Source:   table<2019_dwc_iucn> [?? x 14]
# Database: duckdb_connection
   taxonID scientificName           taxonRank    acceptedNameUsageID taxonomicStatus kingdom   phylum    class          order      family       genus       specificEpithet vernacularName infraspecificEpit~
   <chr>   <chr>                    <chr>        <chr>               <chr>           <chr>     <chr>     <chr>          <chr>      <chr>        <chr>       <chr>           <chr>          <chr>             
 1 NA      Aaptos aaptos var. nigra variety      SLB:130062          synonym         Animalia  Mollusca  Gastropoda     Neogastro~ Conidae      Conus       nimbosus        NA             NA                
 2 NA      Aaptos adriatica         species      SLB:51720           synonym         Plantae   Rhodophy~ Florideophyce~ Gigartina~ Areschougia~ Erythroclo~ angustatum      NA             NA                
 3 NA      Aaptos chromis           species      SLB:130062          synonym         Animalia  Mollusca  Gastropoda     Neogastro~ Conidae      Conus       nobilis         NA             NA                
 4 NA      Aaptos lithophaga        species      SLB:130708          synonym         NA        NA        NA             NA         NA           NA          NA              NA             NA                
 5 NA      Abacola holothuriae      species      SLB:30411           synonym         Chromista Ochrophy~ Coscinodiscop~ Fragilari~ Fragilariac~ Fragilaria  constricta      NA             NA                
 6 NA      Abanericola affinis afr~ subspecies   SLB:142030          synonym         NA        NA        NA             NA         NA           NA          NA              NA             NA                
 7 NA      Abanericola claparedi    species      SLB:38944           synonym         NA        NA        NA             NA         NA           NA          NA              NA             NA                
 8 NA      Abarenicola affinis aff~ nominotypic~ SLB:142030          accepted name   NA        NA        NA             NA         NA           NA          NA              NA             NA                
 9 NA      Abarenicola affinis afr~ subspecies   SLB:142030          accepted name   NA        NA        NA             NA         NA           NA          NA              NA             NA                
10 NA      Abatus nimrodi           species      SLB:152043          synonym         NA        NA        NA             NA         NA           NA          NA              NA             NA                
# ... with more rows`

Not sure if you were able to reproduce this issue but here is my

> sessionInfo()
R version 4.0.5 (2021-03-31)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.5  taxadb_0.1.2

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.6        pillar_1.5.1      compiler_4.0.5    dbplyr_2.1.1      prettyunits_1.1.1 tools_4.0.5      
 [7] progress_1.2.2    contentid_0.0.9   bit_4.0.4         memoise_2.0.0     jsonlite_1.7.2    RSQLite_2.2.5    
[13] lifecycle_1.0.0   tibble_3.1.0      pkgconfig_2.0.3   rlang_0.4.10      cli_2.4.0         DBI_1.1.1        
[19] curl_4.3          fastmap_1.1.0     duckdb_0.2.5      arkdb_0.0.12      httr_1.4.2        generics_0.1.0   
[25] fs_1.5.0          vctrs_0.3.7       askpass_1.1       hms_1.0.0         rappdirs_0.3.3    bit64_4.0.5      
[31] tidyselect_1.1.0  glue_1.4.2        R6_2.5.0          fansi_0.4.2       purrr_0.3.4       readr_1.4.0      
[37] blob_1.2.1        magrittr_2.0.1    ellipsis_0.3.1    assertthat_0.2.1  utf8_1.2.1        stringi_1.5.3    
[43] openssl_1.4.3     cachem_1.0.4      crayon_1.4.1

Thank you very much for the help.

cboettig commented 3 years ago

Whoops, sounds like some of the 2019 links have been crossed. Looking into it.

brunobrr commented 2 years ago

Hi,

I'm using the filter_name function with iucn database (version 2022) but an error message is returned.

sci_name <- c("Puma concolor", "Ilex aquifolium", "Mergus octosetaceus", "Psidium guineense", "Aaptos adriatica")
teste <- taxadb::filter_name(name = sci_name, provider = "iucn", version = 2022)

"Error in .local(conn, statement, ...) : duckdb_prepare_R: Failed to prepare query SELECT "taxonId,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,scientificName,vernacularName,nameAccordingTo,acceptedNameUsageId,population,category", LOWER("input") AS "input" FROM (SELECT "taxonId,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,scientificName,vernacularName,nameAccordingTo,acceptedNameUsageId,population,category" FROM "2022_dwc_iucn") "q01" Error: Binder Error: Referenced column "input" not found in FROM clause! Candidate bindings: "q01.taxonId,kingdom,phylum,class,order,family,genus,specificEpithet,infraspecificEpithet,scientificName,vernacularName,nameAccordingTo,acceptedNameUsageId,population,category" LINE 1: ...ameUsageId,population,category", LOWER("input") AS "input"

I guess this occurs because the iucn 2022 database is not been read correctly (due to an issue in the delimiter used to separate columns)

ddd_2022 <- 
  taxadb::taxa_tbl(provider = "iucn", version = 2022) %>% collect()

image

cboettig commented 2 years ago

yup, thanks for the bug report! we'll fix asap

kguidonimartins commented 2 years ago

Hi Carl, there is some update here?

I did also notice that taxa_tbl cannot parse the IUCN database correctly.

if (!require("tidyverse")) install.packages("tidyverse")
#> Loading required package: tidyverse
if (!require("taxadb")) install.packages("taxadb")
#> Loading required package: taxadb
packageVersion("taxadb")
#> [1] '0.1.5'

c("itis", "gbif", "iucn") %>%
  purrr::map(
    ~ taxadb::taxa_tbl(.x)
  )
#> [[1]]
#> # Source:   table<2022_dwc_itis> [?? x 15]
#> # Database: duckdb_connection
#>    taxonID scientificName taxonRank acceptedNameUsa… taxonomicStatus update_date
#>    <chr>   <chr>          <chr>     <chr>            <chr>           <chr>
#>  1 ITIS:51 Schizomycetes  class     ITIS:50          synonym         2015-03-02
#>  2 ITIS:52 Archangiaceae  family    ITIS:50          synonym         2015-03-02
#>  3 ITIS:54 Rhodobacterii… suborder  ITIS:50          synonym         2015-03-02
#>  4 ITIS:55 Pseudomonadin… suborder  ITIS:50          synonym         2015-03-02
#>  5 ITIS:56 Nitrobacterac… family    ITIS:50          synonym         2015-03-02
#>  6 ITIS:58 Nitrobacter a… species   ITIS:50          synonym         2015-03-02
#>  7 ITIS:59 Nitrobacter f… species   ITIS:50          synonym         2015-03-02
#>  8 ITIS:60 Nitrobacter o… species   ITIS:50          synonym         2015-03-02
#>  9 ITIS:61 Nitrobacter p… species   ITIS:50          synonym         2015-03-02
#> 10 ITIS:62 Nitrobacter p… species   ITIS:50          synonym         2015-03-02
#> # … with more rows, and 9 more variables: kingdom <chr>, phylum <chr>,
#> #   class <chr>, order <chr>, family <chr>, genus <chr>, specificEpithet <chr>,
#> #   infraspecificEpithet <chr>, vernacularName <chr>
#>
#> [[2]]
#> # Source:   table<2022_dwc_gbif> [?? x 17]
#> # Database: duckdb_connection
#>    taxonID     scientificName taxonRank taxonomicStatus acceptedNameUsa… kingdom
#>    <chr>       <chr>          <chr>     <chr>           <chr>            <chr>
#>  1 GBIF:10109… <NA>           genus     accepted        GBIF:10109647    Plantae
#>  2 GBIF:10435… <NA>           unranked  accepted        GBIF:10435335    Animal…
#>  3 GBIF:10342… <NA>           unranked  accepted        GBIF:10342269    Animal…
#>  4 GBIF:10330… <NA>           unranked  accepted        GBIF:10330105    Animal…
#>  5 GBIF:99986… <NA>           unranked  accepted        GBIF:9998601     Animal…
#>  6 GBIF:10434… <NA>           unranked  accepted        GBIF:10434347    Animal…
#>  7 GBIF:10420… <NA>           unranked  accepted        GBIF:10420560    Animal…
#>  8 GBIF:10459… <NA>           unranked  accepted        GBIF:10459774    Animal…
#>  9 GBIF:10454… <NA>           unranked  accepted        GBIF:10454699    Animal…
#> 10 GBIF:10257… <NA>           unranked  accepted        GBIF:10257898    Animal…
#> # … with more rows, and 11 more variables: phylum <chr>, class <chr>,
#> #   order <chr>, family <chr>, genus <chr>, specificEpithet <chr>,
#> #   infraspecificEpithet <chr>, parentNameUsageID <chr>,
#> #   originalNameUsageID <chr>, scientificNameAuthorship <chr>,
#> #   vernacularName <chr>
#>
#> [[3]]
#> # Source:   table<2022_dwc_iucn> [?? x 1]
#> # Database: duckdb_connection
#>    `taxonId,kingdom,phylum,class,order,family,genus,specificEpithet,infraspeci…`
#>    <chr>
#>  1 "IUCN:3,Animalia,Mollusca,Gastropoda,Stylommatophora,Endodontidae,Aaadonta,a…
#>  2 "IUCN:4,Animalia,Mollusca,Gastropoda,Stylommatophora,Endodontidae,Aaadonta,c…
#>  3 "IUCN:5,Animalia,Mollusca,Gastropoda,Stylommatophora,Endodontidae,Aaadonta,f…
#>  4 "IUCN:6,Animalia,Mollusca,Gastropoda,Stylommatophora,Endodontidae,Aaadonta,i…
#>  5 "IUCN:7,Animalia,Mollusca,Gastropoda,Stylommatophora,Endodontidae,Aaadonta,k…
#>  6 "IUCN:8,Animalia,Mollusca,Gastropoda,Stylommatophora,Endodontidae,Aaadonta,p…
#>  7 "IUCN:9,Animalia,Chordata,Actinopterygii,Cypriniformes,Cyprinidae,Aaptosyax,…
#>  8 "IUCN:18,Animalia,Chordata,Mammalia,Rodentia,Abrocomidae,Abrocoma,boliviensi…
#>  9 "IUCN:20,Animalia,Chordata,Reptilia,Squamata,Anguidae,Abronia,montecristoi,N…
#> 10 "IUCN:43,Animalia,Arthropoda,Insecta,Odonata,Aeshnidae,Acanthaeschna,victori…
#> # … with more rows

Created on 2022-06-23 by the reprex package (v2.0.1)

cboettig commented 2 years ago

thanks, yeah, we're overdue for an update on IUCN data, my apologies. IUCN data sources are somewhat less publicly accessible than records like COL or NCBI, so this needs some more manual tinkering.

Should be able to get to this in the next few weeks. thanks for the ping!