ropensci / rcrossref

R client for various CrossRef APIs
https://docs.ropensci.org/rcrossref
Other
165 stars 21 forks source link

cr_cn fails with some valid DOIs #225

Open bobmuscarella opened 2 years ago

bobmuscarella commented 2 years ago

rcrossref is returning errors with some valid DOIs (a sample below). These are valid, as confirmed on doi.org. Any ideas what is going on or how to fix?

Please note that I am using the most recent dev version of rcrossref and I have added my email to the R.environment as per instruction on the rcrossref Github page.

Thanks for any help!

Session Info ```r > library(rcrossref) > sessionInfo() R version 4.0.3 (2020-10-10) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Big Sur 10.16 Matrix products: default LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rcrossref_1.1.0.99 loaded via a namespace (and not attached): [1] Rcpp_1.0.7 plyr_1.8.6 compiler_4.0.3 pillar_1.6.1 [5] later_1.2.0 remotes_2.4.0 tools_4.0.3 digest_0.6.27 [9] jsonlite_1.7.2 lifecycle_1.0.0 tibble_3.1.2 pkgconfig_2.0.3 [13] rlang_0.4.11 shiny_1.6.0 DBI_1.1.1 crul_1.1.0 [17] curl_4.3.1 fastmap_1.1.0 xml2_1.3.2 stringr_1.4.0 [21] dplyr_1.0.6 generics_0.1.0 vctrs_0.3.8 htmlwidgets_1.5.3 [25] DT_0.18 tidyselect_1.1.1 glue_1.4.2 httpcode_0.3.0 [29] R6_2.5.1 fansi_0.5.0 purrr_0.3.4 magrittr_2.0.1 [33] promises_1.2.0.1 ellipsis_0.3.2 htmltools_0.5.1.1 assertthat_0.2.1 [37] mime_0.10 xtable_1.8-4 httpuv_1.6.1 utf8_1.2.1 [41] stringi_1.6.2 miniUI_0.1.1.1 crayon_1.4.1 ```
> cr_cn("10.1111/ddi.13378", "text")
Error in nchar(hh) : invalid multibyte string, element 1
> cr_cn("10.1111/btp.12905", "text")
Error in nchar(hh) : invalid multibyte string, element 1
> cr_cn('10.1038/s41597-020-00788-5')
Error in nchar(hh) : invalid multibyte string, element 1
> cr_cn("10.1111/geb.13346")
Warning message:
v1/works/10.1111/geb.13346/transform w/ (500) - 
njahn82 commented 2 years ago

Thank you for raising this issue @bobmuscarella It seems Crossref API does not encode responses to UTF-8. I will alert Crossref about it. The issue relates to #221

njahn82 commented 2 years ago

Asked Crossref team about header encoding: https://gitlab.com/crossref/issues/-/issues/1574

doomlab commented 2 years ago

@njahn82 - any update on this error?

njahn82 commented 2 years ago

Hi @doomlab Crossref has not fixed the issue yet, but there has been an update on how crul, rcrossref's underlying http client, deals with header encodings (see here). Good news, if you update to the most recent crul version on CRAN (1.2.0), at least the first three examples work; cr_cn("10.1111/geb.13346") returns an internal server error.

library(rcrossref)
cr_cn("10.1111/ddi.13378", "text")
#> [1] "Pouteau, R., Biurrun, I., Brunel, C., Chytrý, M., Dawson, W., Essl, F., Fristoe, T., Haveman, R., Hobohm, C., Jansen, F., Kreft, H., Lenoir, J., Lenzner, B., Meyer, C., Moeslund, J. E., Pergl, J., Pyšek, P., Svenning, J., Thuiller, W., … van Kleunen, M. (2021). Potential alien ranges of European plants will shrink in the future, but less so for already naturalized than for not yet naturalized species. Diversity and Distributions, 27(11), 2063–2076. Portico. https://doi.org/10.1111/ddi.13378"

cr_cn("10.1111/btp.12905", "text")
#> [1] "Rech, A. R., Ollerton, J., Dalsgaard, B., Ré Jorge, L., Sandel, B., Svenning, J., Baronio, G. J., & Sazima, M. (2021). Population‐level plant pollination mode is influenced by Quaternary climate and pollinators. Biotropica, 53(2), 632–642. Portico. https://doi.org/10.1111/btp.12905"

cr_cn('10.1038/s41597-020-00788-5')
#> [1] "@article{Lundgren_2021,\n\tdoi = {10.1038/s41597-020-00788-5},\n\turl = {https://doi.org/10.1038%2Fs41597-020-00788-5},\n\tyear = 2021,\n\tmonth = {jan},\n\tpublisher = {Springer Science and Business Media {LLC}},\n\tvolume = {8},\n\tnumber = {1},\n\tauthor = {Erick J. Lundgren and Simon D. Schowanek and John Rowan and Owen Middleton and Rasmus {\\O}. Pedersen and Arian D. Wallach and Daniel Ramp and Matt Davis and Christopher J. Sandom and Jens-Christian Svenning},\n\ttitle = {Functional traits of the world's late Quaternary large-bodied avian and mammalian herbivores},\n\tjournal = {Scientific Data}\n}"

cr_cn('10.1111/geb.13346')
#> Warning: v1/works/10.1111/geb.13346/transform w/ (500) -

Created on 2022-02-20 by the reprex package (v2.0.0)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.1.2 (2021-11-01) #> os macOS Big Sur 11.4 #> system aarch64, darwin20 #> ui X11 #> language en #> collate de_DE.UTF-8 #> ctype de_DE.UTF-8 #> tz Europe/Copenhagen #> date 2022-02-20 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.1.0) #> backports 1.2.1 2020-12-09 [1] CRAN (R 4.1.0) #> cli 3.1.0 2021-10-27 [1] CRAN (R 4.1.1) #> crayon 1.4.2 2021-10-29 [1] CRAN (R 4.1.1) #> crul 1.2.0 2021-11-22 [1] CRAN (R 4.1.1) #> curl 4.3.2 2021-06-23 [1] CRAN (R 4.1.0) #> DBI 1.1.1 2021-01-15 [1] CRAN (R 4.1.0) #> digest 0.6.28 2021-09-23 [1] CRAN (R 4.1.1) #> dplyr 1.0.7 2021-06-18 [1] CRAN (R 4.1.0) #> DT 0.19 2021-09-02 [1] CRAN (R 4.1.1) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.0) #> evaluate 0.14 2019-05-28 [1] CRAN (R 4.1.0) #> fansi 0.5.0 2021-05-25 [1] CRAN (R 4.1.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.0) #> fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.0) #> generics 0.1.1 2021-10-25 [1] CRAN (R 4.1.1) #> glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.1.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.1.1) #> htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.1.1) #> httpcode 0.3.0 2020-04-10 [1] CRAN (R 4.1.0) #> httpuv 1.6.3 2021-09-09 [1] CRAN (R 4.1.1) #> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.1.0) #> knitr 1.37 2021-12-16 [1] CRAN (R 4.1.1) #> later 1.3.0 2021-08-18 [1] CRAN (R 4.1.1) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1) #> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.0) #> mime 0.12 2021-09-28 [1] CRAN (R 4.1.1) #> miniUI 0.1.1.1 2018-05-18 [1] CRAN (R 4.1.0) #> pillar 1.6.4 2021-10-18 [1] CRAN (R 4.1.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.1.0) #> plyr 1.8.6 2020-03-03 [1] CRAN (R 4.1.0) #> promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.1.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1) #> Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.0) #> rcrossref * 1.1.0.99 2021-10-16 [1] Github (ropensci/rcrossref@319f34c) #> reprex 2.0.0 2021-04-02 [1] CRAN (R 4.1.0) #> rlang 0.4.12 2021-10-18 [1] CRAN (R 4.1.0) #> rmarkdown 2.11 2021-09-14 [1] CRAN (R 4.1.1) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.0) #> shiny 1.7.1 2021-10-02 [1] CRAN (R 4.1.1) #> stringi 1.7.5 2021-10-04 [1] CRAN (R 4.1.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.1.0) #> styler 1.5.1 2021-07-13 [1] CRAN (R 4.1.0) #> tibble 3.1.5 2021-09-30 [1] CRAN (R 4.1.1) #> tidyselect 1.1.1 2021-04-30 [1] CRAN (R 4.1.0) #> triebeard 0.3.0 2016-08-04 [1] CRAN (R 4.1.0) #> urltools 1.7.3 2019-04-14 [1] CRAN (R 4.1.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.1.0) #> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.1.0) #> withr 2.4.3 2021-11-30 [1] CRAN (R 4.1.1) #> xfun 0.29 2021-12-14 [1] CRAN (R 4.1.1) #> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.1.0) #> xtable 1.8-4 2019-04-21 [1] CRAN (R 4.1.0) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.1.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library ```