ropensci / crul

R6 based http client for R (for developers)
https://docs.ropensci.org/crul
Other
107 stars 17 forks source link

How to retry requests for AsyncQueue? #159

Closed moldach closed 1 year ago

moldach commented 3 years ago

Could you please add an example to the documentation for retry() that describes how one can retry requests for AsyncQueue().

library(crul)
library(data.table)
library(jsonlite)

# 5,000 URLs/API calls
tmp = fread("testURLs.txt", header=F)
urls = unlist(tmp)

reqlist <- c()
for(i in 1:length(urls)){
  http_request = HttpRequest$new(urls[[i]])$get()
  reqlist[[i]] <- http_request
}

## rate-limit of 300/min did not have any 429 status for 5000 API calls
## However, there are ~100 "500 status" which vary from run-to-run
## These need to be retried
out <- AsyncQueue$new(.list = reqlist, req_per_min = 300)

start <- Sys.time()
out$request() # make requests
end <- Sys.time()
total_time <- as.numeric (end - start, units = "mins")
print(paste0("Making Requests() took ", total_time, "minutes.")) 

start <- Sys.time()
out$responses() # list responses
end <- Sys.time()
total_time <- as.numeric (end - start, units = "mins")
print(paste0("Making Responses() took ", total_time, "minutes.")) 

# Take a look for 429 status?
resp <- out$responses()

x <- c()
for(i in 1:length(resp)){
  if (resp[[i]]$status_code==200){
    #print("200 status returned")   
  } else if (resp[[i]]$status_code==404) {  
    #print(paste0("404 status returned for row ", i, ": Results not Found"))
  } else if (resp[[i]]$status_code==429) {
    #print(paste0("429 status returned for row ", i, ": Rate limit exceeded"))
  } else if (resp[[i]]$status_code==500) {
    print(paste0("500 status returned for row ", i, ": An internal error has occurred"))
    x[i] <- i
  }  else {
    print(paste0(resp[[i]]$status_code, " for row ", i))  
  }
}
x <- x[!is.na(x)]
length(x)  # shows "101"

If I ran this code again I see there are different 500 statuses being returned which means that some of these API calls need to be retried.

### some of the 500 status are being repeated but not all of them.
### we need to see if these are actual errors from the API that can be re-tried
### or if they are consistent errors because information for that uniprotid do not exist

## There were 100 "500 status" errors thrown from 5000 API calls for the first attempt and 101 for the second
try1 <- as.data.frame(x)
## 
try2 <- as.data.frame(x)

library(dplyr)
anti_join(try1, try2)
anti_join(try2, try1)

The following doesn't work for the resp or out objects: (res_get <- x$retry("GET", path = "status/400")).

How can retry() be run on AsyncQueue() results?

moldach commented 3 years ago

Is retry() currently not supported by AsyncQueue()?

Screen Shot 2021-06-17 at 10 13 05 AM

If so, could this please 🙏 be added?

sckott commented 3 years ago

Yeah, sorry HttpRequest doesn't have a retry method yet. So I think it just needs to be supported there, and then you can use it in AsyncQueue

sckott commented 1 year ago

asked curl maintainer about this, we'll see

sckott commented 1 year ago

@moldach finally this is done. install from github and try again. there's a brief example in the AsyncQueue docs https://docs.ropensci.org/crul/reference/AsyncQueue.html#ref-examples