r-lib / httr2

Make HTTP requests and process their responses. A modern reimagining of httr.
https://httr2.r-lib.org
Other
236 stars 57 forks source link

allow user to override parts of `resp_is_cacheable` #421

Closed datapumpernickel closed 9 months ago

datapumpernickel commented 9 months ago

Hi!

first of all, compliments on getting out version 1.0 and the overall package itself. It is very well thought through and works like a charm. In the rewrite of comtradr we have grown to love it!

I have thought about implementing caching responses in the above mentioned package, because it seems sensible to have this functionality for a package where the free API has considerable limits in how much data you can query and the server does error our every once in a while, leading to the loss of data, if you have not written your code in a robust way, for which users often lack time or knowledge.

Since we are already in the httr2 universe, I was hoping to achieve this through req_cache. Unfortunately, the Comtrade API does not provide the necessary request headers to pass the tests in resp_is_cacheable.

The error, to the best of my understanding, is triggered here:

if (!any(resp_header_exists(resp, c("Etag", "Last-Modified", "Expires")))) {
    return(FALSE)
  }

Hence, the response from Comtrade is flagged as non-cacheable. Which I understand, makes sense, because httr2 cannot verify whether the resource has changed or when it might expire.

I would like to be able to override this check, to still cache results for the following reasons:

1) API's will not always be implemented the right way, sometimes they might miss these headers and we should still be able to cache some responses. The user should be able to run this operation on their own risk. 2) For Trade Data it actually makes sense to not update the cache everytime the resource has changed, because it is being updated all the time anyways, meaning we always only catch a glimpse of the data at a specified time. If new data is needed, the cache can be cleared manually. I understand that we could also implement a different type of caching outside of httr2 for this functionality, but it would be much more convenient to do it inside of the already existing framework.

@jonthegeek over at rOpenSci has offered to implement such a functionality, if this is something you guys would think might be sensible to implement.

Thank you very much in advance for taking a look at this and looking forward to your input!

On a sidenote: While trying this whole thing out, I was a bit confused, because debug=T did not return any messages to me as I would have expected. Not sure why. The code is enclosed. It uses some internal functions from comtradr to test the caching. You will however need a API key to test this - not sure how else to make it reproducible?

test <- comtradr:::ct_check_params(freq = 'A',
                                   type = 'goods',
                              commodity_classification = 'HS',
                      commodity_code = 'TOTAL',
                      reporter = "DEU",
                      partner = 'AUS',
                      start_date = '1990',
                      end_date = '1991',
                      flow_direction = "import",
                      partner_2 = 'World',
                      mode_of_transport = 'Air',
                      customs_code = 'C00',
                      verbose = F,
                      update = T,
                      extra_params = NULL) |>
  comtradr:::ct_build_request(cache_dir = 'data',
                              primary_token = comtradr::get_primary_comtrade_key()) |>
  httr2::req_cache(path = 'data', debug = T)

httr2::req_perform(test)
#> <httr2_response>
#> GET
#> https://comtradeapi.un.org/data/v1/get/C/A/HS?cmdCode=TOTAL&flowCode=M&partnerCode=36&reporterCode=280%2C276&period=1990%2C1991&motCode=1000&partner2Code=0&customsCode=C00&includeDesc=TRUE
#> Status: 200 OK
#> Content-Type: application/json
#> Body: In memory (58 bytes)
httr2::resp_headers(httr2::last_response())
#> <httr2_headers>
#> Transfer-Encoding: chunked
#> Content-Type: application/json; charset=utf-8
#> Content-Encoding: gzip
#> Vary: Accept-Encoding
#> Request-Context: appId=cid-v1:9b6e1d5a-3728-46ff-b743-6d33d23e54a6
#> x-frame-options: deny
#> X-Content-Type-Options: nosniff
#> X-XSS-Protection: 1;mode=block
#> strict-transport-security: max-age=31536000;includeSubDomains
#> content-security-policy: frame-src 'self'
#> x-permitted-cross-domain-policies: none
#> Referrer-Policy: no-referrer-when-downgrade
#> Permissions-Policy: accelerometer=(), camera=(), geolocation=(), gyroscope=(), magnetometer=(), microphone=(), payment=(), usb=()
#> Date: Sat, 23 Dec 2023 16:38:04 GMT
Created on 2023-12-23 with [reprex v2.0.2](https://reprex.tidyverse.org/)
jonthegeek commented 9 months ago

With the details, I don't think this particular override makes sense (as we're discussing in slack). I can imagine cases where the headers are incorrect but caching still could work, but I don't think this is such a case.

datapumpernickel commented 9 months ago

Ok, will close this issue for now! Thanks @jonthegeek!