ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
267 stars 60 forks source link

taxize package, gnr_resolve() , Error: Request Entity Too Large (HTTP 413) #899

Open TimWOceanSciences opened 2 years ago

TimWOceanSciences commented 2 years ago

Hello

I have been successful in running gnr_resolve on a data set with a few 100 rows to get matched names. However i'm getting the following error (Error: Request Entity Too Large (HTTP 413)) when running my bigger data set (79,298 rows), even with 'http="post"'. Is there any way I can easily overcome this? My data set will get larger and larger in the future so not being able to run large data sets will cause me a real headache. I know the wormsbynames() function of the worms package which I'm also using processes the data into chunks to avoid this I think. Does taxsize gnr_resolve() have an equivalent? What is the max data set size gnr_resolve() can handle? what I run below

Taxize_test<-gnr_resolve(Formatted_Benthic_Biomass_Data_WW_TW_FINAL$Nomen,resolve_once = FALSE,best_match_only = TRUE,canonical = TRUE, http="post",fields="all",preferred_data_sources=9) Error: Request Entity Too Large (HTTP 413)

Any help greatly appreciated.

session_info() ─ Session info ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── setting value version R version 4.2.0 (2022-04-22 ucrt) os Windows 10 x64 (build 19044) system x86_64, mingw32 ui RStudio language (EN) collate English_United Kingdom.utf8 ctype English_United Kingdom.utf8 tz Europe/London date 2022-09-30 rstudio 2022.02.2+485 Prairie Trillium (desktop) pandoc NA

─ Packages ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── package version date (UTC) lib source ape 5.6-2 2022-03-02 [1] CRAN (R 4.2.1) assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0) backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0) bit 4.0.4 2020-08-04 [1] CRAN (R 4.2.0) bit64 4.0.5 2020-08-30 [1] CRAN (R 4.2.0) bold 1.2.0 2021-05-11 [1] CRAN (R 4.2.1) broom 0.8.0 2022-04-13 [1] CRAN (R 4.2.0) cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0) callr 3.7.0 2021-04-20 [1] CRAN (R 4.2.0) cellranger 1.1.0 2016-07-27 [1] CRAN (R 4.2.0) cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0) codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.0) colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0) conditionz 0.1.0 2019-04-24 [1] CRAN (R 4.2.1) crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0) crul 1.3 2022-09-03 [1] CRAN (R 4.2.1) curl 4.3.2 2021-06-23 [1] CRAN (R 4.2.0) data.table 1.14.2 2021-09-27 [1] CRAN (R 4.2.0) DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0) dbplyr 2.1.1 2021-04-06 [1] CRAN (R 4.2.0) devtools 2.4.4 2022-07-20 [1] CRAN (R 4.2.1) digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.0) ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0) fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) forcats 0.5.1 2021-01-27 [1] CRAN (R 4.2.0) foreach 1.5.2 2022-02-02 [1] CRAN (R 4.2.1) foreign 0.8-82 2022-01-16 [2] CRAN (R 4.2.0) fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0) ggplot2 3.3.6 2022-05-03 [1] CRAN (R 4.2.0) glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0) haven 2.5.0 2022-04-15 [1] CRAN (R 4.2.0) hms 1.1.1 2021-09-26 [1] CRAN (R 4.2.0) htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0) htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.2.0) httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.2.1) httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.0) httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.0) iterators 1.0.14 2022-02-05 [1] CRAN (R 4.2.1) jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0) later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0) lattice 0.20-45 2021-09-22 [2] CRAN (R 4.2.0) lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0) lubridate 1.8.0 2021-10-07 [1] CRAN (R 4.2.0) magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) maptools 1.1-4 2022-04-17 [1] CRAN (R 4.2.0) memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.1) mime 0.12 2021-09-28 [1] CRAN (R 4.2.0) miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.2.1) modelr 0.1.8 2020-05-19 [1] CRAN (R 4.2.0) munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0) nlme 3.1-157 2022-03-25 [2] CRAN (R 4.2.0) pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0) pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.1) pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.2.1) plyr 1.8.7 2022-03-24 [1] CRAN (R 4.2.0) prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0) processx 3.5.3 2022-03-25 [1] CRAN (R 4.2.0) profvis 0.3.7 2020-11-02 [1] CRAN (R 4.2.1) promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0) ps 1.7.0 2022-04-23 [1] CRAN (R 4.2.0) purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0) R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) raster 3.5-15 2022-01-22 [1] CRAN (R 4.2.0) Rcpp 1.0.8.3 2022-03-17 [1] CRAN (R 4.2.0) readr 2.1.2 2022-01-30 [1] CRAN (R 4.2.0) readxl 1.4.0 2022-03-28 [1] CRAN (R 4.2.0) remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.1) reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0) reshape 0.8.9 2022-04-12 [1] CRAN (R 4.2.1) rgdal 1.5-32 2022-05-09 [1] CRAN (R 4.2.0) rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.1) rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0) rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0) scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0) sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.1) shiny 1.7.1 2021-10-02 [1] CRAN (R 4.2.0) sp 1.4-7 2022-04-20 [1] CRAN (R 4.2.0) stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0) stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) taxize 0.9.100 2022-04-22 [1] CRAN (R 4.2.1) terra 1.5-21 2022-02-17 [1] CRAN (R 4.2.0) tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0) tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.2.0) tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0) tidyverse 1.3.1 2021-04-15 [1] CRAN (R 4.2.0) triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.2.1) tzdb 0.3.0 2022-03-28 [1] CRAN (R 4.2.0) urlchecker 1.0.1 2021-11-30 [1] CRAN (R 4.2.1) urltools 1.7.3 2019-04-14 [1] CRAN (R 4.2.1) usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.1) utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0) uuid 1.1-0 2022-04-19 [1] CRAN (R 4.2.0) vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0) vroom 1.5.7 2021-11-30 [1] CRAN (R 4.2.0) withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) worms 0.2.2 2018-04-25 [1] CRAN (R 4.2.1) worrms 0.4.2 2020-07-08 [1] CRAN (R 4.2.1) xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0) xtable 1.8-4 2019-04-21 [1] CRAN (R 4.2.0) zoo 1.8-10 2022-04-15 [1] CRAN (R 4.2.1)

TimWOceanSciences commented 2 years ago

I got round it with the below

need to split the data table for taxize due to size.

chunk <- 1000 n <- nrow(Formatted_Benthic_Biomass_Data_WW_TW_FINAL) r <- rep(1:ceiling(n/chunk),each=chunk)[1:n] d <- split(Formatted_Benthic_Biomass_Data_WW_TW_FINAL,r)

output <- list() for (i in seq_along(d)){ Taxize_test<-gnr_resolve(d[[i]]$Nomen,resolve_once = FALSE,best_match_only = TRUE,canonical = TRUE, http="post",fields="all",preferred_data_sources=9) output[[i]] <- Taxize_test }

rejoin tibbles of gnr_resolve

Taxize_test<-bind_rows(output)