ropensci / rcrossref

R client for various CrossRef APIs
https://docs.ropensci.org/rcrossref
Other
166 stars 20 forks source link

Problem with DOI containing trailing slash "/" (and maybe other url-breaking symbols like ; # ? < > \ ) #239

Open eutanatos opened 1 year ago

eutanatos commented 1 year ago

I have a problem with retrieving metadata for some type of DOI that contains trailing slash, like this: "10.36652/0042-4633-2023-102-5-404-413/".

_cr_works_ does not resolves this DOI:

rcrossref::cr_works("10.36652/0042-4633-2023-102-5-404-413/")$data

#Warning message:
#404 (client error): /works/10.36652/0042-4633-2023-102-5-404-413/ - Resource not found. 

and Crossref REST API too

but doi.org resolves this DOI and I'm sure that it is registered in Crossref.

Little investigation leads me to this statement in current Crossref Unified Resource API, in API overview chapter:

"You should always url-encode DOIs and parameter values when using the API. DOIs are notorious for including characters that break URLs (e.g. semicolons, hashes, slashes, ampersands, question marks, etc.)."

So I suggest fix for this issue - add transformation to url-encode DOI. In my case its changing "/" on "%2F":

rcrossref::cr_works("10.36652/0042-4633-2023-102-5-404-413%2F", .progress = "text")$data

P.S.: Found Crossref documentation for members about construction of DOI where they ask not to use / in DOIs: "Do not encode forward slash / when resolving DOIs or retrieving metadata from our REST API", but what can we do using only single words...

----------- additional info, perhaps this will be useful to someone else -----------

I found that adding extra slash also gives positive result either for _cr_works_ or Crossref REST API:

(rcrossref::cr_works("10.36652/0042-4633-2023-102-5-404-413//")$data

And my investigation leads me to old Crossref API issue thread_1 and thread_2 touching some sort of this problem. As I understand they does not close possibility of creation DOI with trailing slash and now trailing slash breaks "/agency" queries even if DOI url-encoded.

njahn82 commented 1 year ago

Hi @eutanatos , can confirm that Crossref does not return metadata for this record,

https://api.crossref.org/works/10.36652/0042-4633-2023-102-5-404-413/

However, I don't think rcrossref has an issue with trailing slashes, eg:

rcrossref::cr_works("10.1002/asi.24460/")
#> $meta
#> NULL
#> 
#> $data
#> # A tibble: 1 × 35
#>   alternative.id    archive container.title    created deposited published.print
#>   <chr>             <chr>   <chr>              <chr>   <chr>     <chr>          
#> 1 10.1002/asi.24460 Portico Journal of the As… 2021-0… 2023-08-… 2021-09        
#> # ℹ 29 more variables: published.online <chr>, doi <chr>, indexed <chr>,
#> #   issn <chr>, issue <chr>, issued <chr>, member <chr>, page <chr>,
#> #   prefix <chr>, publisher <chr>, score <chr>, source <chr>,
#> #   reference.count <chr>, references.count <chr>,
#> #   is.referenced.by.count <chr>, subject <chr>, title <chr>, type <chr>,
#> #   update.policy <chr>, url <chr>, volume <chr>, abstract <chr>,
#> #   language <chr>, short.container.title <chr>, assertion <list>, …
#> 
#> $facets
#> NULL

Created on 2023-10-02 with reprex v2.0.2