ropensci / bold

Interface to the Bold Systems barcode webservice
https://docs.ropensci.org/bold
Other
17 stars 11 forks source link

Problem with UTF-8 encoding #81

Closed chleeb closed 2 years ago

chleeb commented 2 years ago

I recently updated R, RStudio and all my R packages. Since then I observed a bug in bold which might be linked to the UTF-8 encoding. When I run

requested_bold_records <- bold_seqspec(id = "GBBSP1585-15")

requested_bold_records$copyright_licenses is CreativeCommons \u0096 Attribution Share-Alike (by-sa) 818. Before updating everything it was CreativeCommons – Attribution Share-Alike (by-sa) 818. As \u0096 is it might be a problem with the UTF-8 encoding.

Any ideas what's going on and how to fix it? Thanks!

Session Info ```r R version 4.1.3 (2022-03-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19042) Matrix products: default Random number generation: RNG: Mersenne-Twister Normal: Inversion Sample: Rounding locale: [1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252 [4] LC_NUMERIC=C LC_TIME=German_Germany.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] bold_1.2.0 loaded via a namespace (and not attached): [1] compiler_4.1.3 magrittr_2.0.2 plyr_1.8.6 R6_2.5.1 tools_4.1.3 httpcode_0.3.0 curl_4.3.2 [8] urltools_1.7.3 Rcpp_1.0.8.3 triebeard_0.3.0 xml2_1.3.3 stringi_1.7.6 reshape_0.8.8 crul_1.2.0 [15] stringr_1.4.0 jsonlite_1.8.0 ```
salix-d commented 2 years ago

Seems to be an issue with the Encoding() assignement function specifically. Can't check if it's windows specific at the moment. Using the enc2utf8() function or specifying the encoding directly in read.delim() , the displays properly. So I'll fix the function to do that instead.

In the mean time, to keep it in UTF-8 and have it display properly, you can change the encoding to "latin1" (or even just "") and back to UTF-8 with enc2utf8(Encoding<-(requested_bold_records$copyright_licenses, ""))

cjfields commented 2 years ago

I'm seeing a related error:

> test <- bold_seqspec('Thecostraca')
Error in type.convert.default(data[[i]], as.is = as.is[i], dec = dec,  : 
  invalid multibyte string at '<a0>Oct<6f>meris angulosa'

Session info:

R version 4.1.3 (2022-03-10)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Monterey 12.3.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] dplyr_1.0.8   taxize_0.9.99 bold_1.2.0   

loaded via a namespace (and not attached):
 [1] zoo_1.8-9         tidyselect_1.1.2  xfun_0.30         purrr_0.3.4       lattice_0.20-45   vctrs_0.3.8       generics_0.1.2    htmltools_0.5.2   yaml_2.3.5       
[10] utf8_1.2.2        rlang_1.0.2       pillar_1.7.0      httpcode_0.3.0    glue_1.6.2        DBI_1.1.2         uuid_1.0-3        foreach_1.5.2     lifecycle_1.0.1  
[19] plyr_1.8.6        stringr_1.4.0     codetools_0.2-18  evaluate_0.15     knitr_1.37        fastmap_1.1.0     parallel_4.1.3    curl_4.3.2        fansi_1.0.2      
[28] triebeard_0.3.0   urltools_1.7.3    Rcpp_1.0.8        jsonlite_1.8.0    digest_0.6.29     stringi_1.7.6     grid_4.1.3        cli_3.2.0         tools_4.1.3      
[37] magrittr_2.0.2    tibble_3.1.6      crul_1.2.0        crayon_1.5.0      ape_5.6-2         pkgconfig_2.0.3   ellipsis_0.3.2    data.table_1.14.2 xml2_1.3.3       
[46] assertthat_0.2.1  rmarkdown_2.13    reshape_0.8.9     rstudioapi_0.13   iterators_1.0.14  R6_2.5.1          conditionz_0.1.0  nlme_3.1-155      compiler_4.1.3   
salix-d commented 2 years ago

Changed the encoding function, it should work now. Let me know if it didn't fix it.