ropensci / taxize

A taxonomic toolbelt for R
https://docs.ropensci.org/taxize
Other
267 stars 60 forks source link

sci2comm Error URL rejected: Malformed input to a URL function #931

Closed Kavega closed 4 months ago

Kavega commented 4 months ago

Currently have an error attempting to get the common names of a species list. If I put the species in one at a time it works fine or if my database has just a single species name it also works. However once there are two (even two which work fine on their own) I get the following error: Here showing success with two species alone but when in a database it fails.

> comm_nam<-sci2comm(Species1, db="ncbi")

══  1 queries  ═══════════════

Retrieving data for taxon 'Vulpes vulpes'

✔  Found:  Vulpes+vulpes
══  Results  ═════════════════

• Total: 1 
• Found: 1 
• Not Found: 0

> comm_nam<-sci2comm(Species2, db="ncbi")
══  1 queries  ═══════════════

Retrieving data for taxon 'Rana temporaria'

✔  Found:  Rana+temporaria
══  Results  ═════════════════

• Total: 1 
• Found: 1 
• Not Found: 0

> comm_nam<-sci2comm(SpeciesCombo, db="ncbi")
══  2 queries  ═══════════════

Retrieving data for taxon 'Rana temporaria'

✔  Found:  Rana+temporaria

Retrieving data for taxon 'Vulpes vulpes'

✔  Found:  Vulpes+vulpes
══  Results  ═════════════════

• Total: 2 
• Found: 2 
• Not Found: 0
Error in curl::curl_fetch_memory(x$url$url, handle = x$url$handle) : 
  URL rejected: Malformed input to a URL function
zachary-foster commented 4 months ago

Hello, thanks for the report. It seems to work for me:

library(taxize)
sci2comm(c('Rana temporaria', 'Vulpes vulpes'), db="ncbi")
#> ══  1 queries  ═══════════════
#> 
#> Retrieving data for taxon 'Rana temporaria'
#> ✔  Found:  Rana+temporaria
#> ══  Results  ═════════════════
#> 
#> • Total: 1 
#> • Found: 1 
#> • Not Found: 0
#> ══  1 queries  ═══════════════
#> 
#> Retrieving data for taxon 'Vulpes vulpes'
#> ✔  Found:  Vulpes+vulpes
#> ══  Results  ═════════════════
#> 
#> • Total: 1 
#> • Found: 1 
#> • Not Found: 0
#> $`Rana temporaria`
#> [1] "common frog"
#> 
#> $`Vulpes vulpes`
#> [1] "red fox"

Created on 2024-05-23 with reprex v2.1.0

Can you send me your sessionInfo()?

Kavega commented 4 months ago

sure no problem

R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 22631)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.utf8  LC_CTYPE=English_United States.utf8    LC_MONETARY=English_United States.utf8 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] rrapply_1.2.6   lubridate_1.9.3 forcats_1.0.0   stringr_1.5.1   dplyr_1.1.4     purrr_1.0.2     readr_2.1.4     tidyr_1.3.0     tibble_3.2.1    ggplot2_3.4.4   tidyverse_2.0.0
[12] taxizedb_0.3.1  taxize_0.9.100  emmeans_1.9.0  

loaded via a namespace (and not attached):
 [1] rredlist_0.7.1     tidyselect_1.2.0   blob_1.2.4         urltools_1.7.3     fastmap_1.1.1      TH.data_1.1-2      digest_0.6.33      estimability_1.4.1 timechange_0.2.0  
[10] lifecycle_1.0.4    survival_3.5-7     terra_1.7-55       RSQLite_2.3.6      magrittr_2.0.3     compiler_4.3.2     rlang_1.1.2        tools_4.3.2        utf8_1.2.4        
[19] data.table_1.14.8  conditionz_0.1.0   bit_4.0.5          sp_2.1-2           curl_5.2.0         xml2_1.3.6         multcomp_1.4-25    httpcode_0.3.0     withr_3.0.0       
[28] triebeard_0.4.1    grid_4.3.2         fansi_1.0.5        xtable_1.8-4       colorspace_2.1-0   scales_1.3.0       iterators_1.0.14   MASS_7.3-60        crul_1.4.2        
[37] cli_3.6.1          mvtnorm_1.2-4      crayon_1.5.2       generics_0.1.3     rstudioapi_0.15.0  httr_1.4.7         tzdb_0.4.0         DBI_1.2.1          ape_5.7-1         
[46] cachem_1.0.8       splines_4.3.2      parallel_4.3.2     soiltexture_1.5.1  vctrs_0.6.5        Matrix_1.6-1.1     sandwich_3.1-0     jsonlite_1.8.8     hms_1.1.3         
[55] bit64_4.0.5        foreach_1.5.2      glue_1.6.2         codetools_0.2-19   stringi_1.8.2      gtable_0.3.4       munsell_0.5.0      pillar_1.9.0       rappdirs_0.3.3    
[64] R6_2.5.1           hoardr_0.5.4       dbplyr_2.4.0       tcltk_4.3.2        bold_1.3.0         lattice_0.21-9     memoise_2.0.1      Rcpp_1.0.11        uuid_1.2-0        
[73] coda_0.19-4        nlme_3.1-163       zoo_1.8-12         pkgconfig_2.0.3   

However I will say putting in the species manually does work for me as well. What doesn't work is loading a csv which has two species and then running it.

zachary-foster commented 4 months ago

What doesn't work is loading a csv which has two species and then running it.

Oh, maybe it has to do with the formatting of the csv. Can you send a file that reproduces this error?

Kavega commented 4 months ago

I guess it must then be something with that. Here is the very simple file. I have also tried with no header row. Sorry if this is a very obvious fix that I've messed up! Species_list.csv

zachary-foster commented 4 months ago

This also works for me. Does this code produce an error for you?

library(taxize)
path <- "~/Downloads/Species_list.csv"
data <- read.csv(path)
sci2comm(data$scientific_name, db="ncbi")
#> ══  1 queries  ═══════════════
#> 
#> Retrieving data for taxon 'Rana temporaria'
#> ✔  Found:  Rana+temporaria
#> ══  Results  ═════════════════
#> 
#> • Total: 1 
#> • Found: 1 
#> • Not Found: 0
#> ══  1 queries  ═══════════════
#> 
#> Retrieving data for taxon 'Vulpes vulpes'
#> ✔  Found:  Vulpes+vulpes
#> ══  Results  ═════════════════
#> 
#> • Total: 1 
#> • Found: 1 
#> • Not Found: 0
#> $`Rana temporaria`
#> [1] "common frog"
#> 
#> $`Vulpes vulpes`
#> [1] "red fox"

Created on 2024-05-23 with reprex v2.1.0

Kavega commented 4 months ago

I see! Apparently the entire issue was due to me not specifying the column but rather inputting the entire dataset. I ran the code with just sci2comm(scientific_name, db="ncbi") which worked when it was one species and also worked to read the data and find the species yet would produce that error when attempting to output it. Thank you sci2comm(data$scientific_name, db="ncbi") works for me. We can close it.

zachary-foster commented 4 months ago

Ok, glad it was resolved