Closed moldach closed 3 years ago
Thanks for the issue @moldach
will have a look
@moldach Did you try this yet https://docs.ropensci.org/crul/reference/AsyncQueue.html
Thanks for pointing me in the right direction @sckott
I'm still a bit confused as the documentation for simple async mentions how to parse the HttpClient
results, e.g.:
(cc <- Async$new(
urls = c(
'https://httpbin.org/get?a=5',
'https://httpbin.org/get?a=5&b=6',
'https://httpbin.org/ip'
)
))
(res <- cc$get())
res[[1]]$parse("UTF-8")
However, the same doesn't work for AsyncQueue()
:
reqlist <- list(
HttpRequest$new(url = "https://httpbin.org/get")$get(),
HttpRequest$new(url = "https://ropensci.org/blog")$get(),
HttpRequest$new(url = "https://ropensci.org/careers")$get()
)
out <- AsyncQueue$new(.list = reqlist, bucket_size = 5, sleep = 3)
out
out$request() # make requests
out$responses() # list responses
out$parse() ### Returns character(0)
character(0)
Or this:
out[[1]]$parse("UTF-8")
Error in out[[1]] : wrong arguments for subsetting an environment
P.S. Also, how does one query a specific response, say 10,000th of 500,000
urls?
out$responses()
tries to print all to the console and is therefore limited by max.print()
in RStudio.
The usage for AsyncQueue
is a bit different from Async
because of the nature of what it takes to do http requests from q queue. With AsyncQueue
the responses go into a bucket (a list essentially), and instead of returning a new object, you use the same object to access the results (responses).
$parse()
comes from the inherited parent class AsyncVaried
- I need to fix this method for the queue class.Until it's fixed, to iterate through responses with AsyncQueue
you can do lapply, or similar like:
lapply(out$responses(), function(x) x$parse())
Also, how does one query a specific response, say 10,000th of 500,000 urls...
What do you mean by "query a specific response"?
I'll consider maybe making a S3 print method for the $responses()
method so that you don't have to deal with a huge dump of text to the console ...
What do you mean by "query a specific response"?
I need to make 372,059 API calls and unfortunately the documentation (for the API I am querying) doesn't specify what the rate-limit is; therefore I need to try and find it by trial-and-error.
Ultimately, I would like to be able to check for status=429
errors:
Too many requests, Please try again in 5 minutes.
So for example, it would be nice to check a particular response()
for API call #10,000
to see if I'm making too many calls.
Right now AsyncQueue
is configured to be blocking. That is, when you run $request()
you have to wait for it to finish all requests before doing anything else in that R console. We could change the class so it's not blocking, and then you could inquire about the status of requests, etc. There's tradeoffs to this since you then have to make sure to check when all requests are complete, etc. BUT that would be a significant change and would take some time to do. Opened an issue for that
@moldach your example above for AsyncQueue should work now
Took a shot at updating to the latest version remotes::install_github("ropensci/crul")
but I'm still getting the same error for the example above:
reqlist <- list(
HttpRequest$new(url = "https://httpbin.org/get")$get(),
HttpRequest$new(url = "https://ropensci.org/blog")$get(),
HttpRequest$new(url = "https://ropensci.org/careers")$get()
)
out <- AsyncQueue$new(.list = reqlist, bucket_size = 5, sleep = 3)
out
out$request() # make requests
out$responses() # list responses
out$parse() ### Returns character(0)
out[[1]]$responses() ### Returns Error in out[[1]] : wrong arguments for subsetting an environment
Did you install from this github repository? like remotes::install_github("ropensci/crul")
, then make sure to restart R before trying again. working for me right now. and out[[1]]$responses()
should be out$responses()[[1]]
out[[1]]$responses()
should beout$responses()[[1]]
Changing this fixed it.
Thanks! ❤️
great!
I'm getting an error when exceeding rate limit when attempting to make 2000 API calls to https://api.targetsafety.info/
How does one limit the number of concurrent API calls to avoid this?