Closed maltman closed 1 year ago
Thanks a lot @maltman for this report. Indeed at this time the mechanism is rather primitive. We'll definitely look into handling this by while_oai()
.
Researching this further (CC @sckott)...
It would be natural for while_oai()
to switch from httr::GET()
to httr::RETRY()
which has a built-in functionality to take advantage of retry-after
in the response header.
Problems/questions:
retry-after
in the header with the number of seconds to wait. This seems somewhat unconventional as one would probably expect HTTP 429 in such a situation. Incidentally httr::RETRY()
handles retry-after
, but only upon receiving HTTP 429, this is hard-coded in https://github.com/r-lib/httr/blob/21ff69f219ad11298854a63b8f753389088cf382/R/retry.R#L104retry-after
upon HTTP 503 is a convention among OAI services?Thanks for the quick response!
Comment -- yes, a 429 is likely a better choice for a new protocol. However, a 503+ retry is documented in the HTTP 1.1 RFC and the OAI-PMH specs. So it seems likely this is not a case limited to arXiv.
http://www.openarchives.org/OAI/openarchivesprotocol.html#StatusCodes , http://www.openarchives.org/OAI/openarchivesprotocol.html#FlowControl and in the HTTP 1.1 RFC -- https://datatracker.ietf.org/doc/html/rfc7231#section-6.6.4
(429's are part of a different RFC -- "additional status codes").
Thanks @maltman for these links.
I'm testing it right now, but your original query just does not want to fail now and I'm getting 200s only... :D
I just pushed i64-retrying
branch which replaces GET()
with RETRY()
. The CI still chews on it. @maltman , can you please install from that branch and check whether it works for you?
BTW that OAI query returns quite a big chunks of results. You may want to take advantage of a dumper function (see ?dumpers
) to save the results incrementally.
Try:
This results in:
Service Unavailable (HTTP 503)
No results are returned, even though partial results were collected. So there is no graceful way to resume.
Their are three issues here that seem to make the overall interface non-robust
stop()
which ...A possible workaround could be to write an external wrapper that divides up the "from" - "to" interval into small chunks, and uses
purr::
wrappers to schedule each chunk and retry... This is inelegant. A cleaner solution might be to handle the OAI flow control explicitly internally inwhile_oai()
, and to at least return partial values and a resumption token on error.