Cache DataCite responses locally

ropensci / rdatacite

Wrapper to DataCite metadata

https://docs.ropensci.org/rdatacite

Other

25 stars 3 forks source link

Cache DataCite responses locally #31

Open katrinleinweber opened 4 years ago

katrinleinweber commented 4 years ago

It would useful for teams of bibliometricians to be able to share/sync a cache of their DataCite queries. Analogous to pybliometrics Python package for example.

Which of the "cache"-related CRAN packages seems most usable to ensure that? I tried it with https://github.com/r-lib/memoise/issues/106 but that approach seems to have failed.

sckott commented 4 years ago

Thanks for the issue @katrinleinweber - What exactly do you want to cache? The HTTP response with headers and the raw response body? Or the parsed response as a data.frame/list? Or some other format?

katrinleinweber commented 4 years ago

I think pybliometrics caches the entire HTTP response.

I presume the risk of discarding potentially useful metadata (maybe an etag for requesting a refresh later?) is not worth the small saving in storage space. Maybe @pybliometrics-dev can comment about their thinking what to cache? I tried to find an explanation in the blame trail of the above-linked lines and read PR 17.

sckott commented 4 years ago

thanks @katrinleinweber

I have been tinkering with this pkg in development https://github.com/ropenscilabs/webmiddens exactly for the use case of caching http requests/responses with expiry, etc. I'll try to get that working

sckott commented 4 years ago

@katrinleinweber install the version on middens branch remotes::install_github("ropensci/rdatacite@middens")

The README https://github.com/ropensci/rdatacite/tree/middens#caching has some instructions on use.

The cached data is persistent on disk - in binary format to save disk space - it's not human readable really - its a feature I want to add to webmiddens though. You can set the cache path folder - see ?dc_caching

katrinleinweber commented 4 years ago

Awesome, thank you! I hope I'll have time to test it this March, but there is a risk that I won't.