Closed stephlocke closed 7 years ago
Hi @stephlocke, thanks for reporting the problem and for the reproducible example. I ended up writing a long response that goes into a lot of detail, but the gist of it is that I believe the issue you're experiencing is related to #13, for which a fix is currently in devel and I'll be submitting an update to CRAN soon. My reasoning is below:
I did find a rate limit error recently that I've been looking into (#13), but it was occurring about 2-5 times out of every 1,000, not 1 out of every 4 (!). So I'm trying to figure out if your test failing is a manifestation of #13, in which case the version currently in the devel branch fixes the problem and I'll be submitting the update to CRAN shortly, or something else. I forked HIBPwned and have run the tests several times without any failing tests, which also leads me to believe that the problem may be #13 (which crops up infrequently so would be hard to replicate by just running the tests a few times).
When I run your example, I'm also seeing around 20 of the 100 trials taking < 1.5s, as you are (way more than I'd expect based on #13). However, my_check
never seems to fail -- is there a reason for this? I re-created the account_breaches
function without the ratelimiting and then ran it twice in a row and the second result was in fact a 429 error, so I'd expect if a rate limit was broken while running the example that my_check
would fail.
I know microbenchmark
uses the warmups to measure overhead and then subtracts out the estimated overhead from execution times to try to get a "true" measure of a function's execution time. But ratelimitr
doesn't need to ensure each function call takes > 1.5 seconds, only that > 1.5 seconds have elapsed between successive calls. That my_check
isn't failing suggests to me that, at least when I'm running your example, the rate limit isn't actually being broken. If not, then it might be that microbenchmark
is not measuring the correct thing for our purposes. What we'd like to know for certain is that httr::GET
is not being called more than once in 1.5 seconds. So I mocked up the following example, using Sys.sleep(runif(1))
as a stand-in for what's happening in the GET
request, but mock_account_breaches
also records the exact time that the fake GET
request is about to be called.
mock_account_breaches <- function() {
tmp <- microbenchmark::get_nanotime()
Sys.sleep(runif(1, min = 0, max = 1))
tmp
}
mock_account_breaches <- ratelimitr::limit_rate(
mock_account_breaches,
ratelimitr::rate(n = 1, period = 1.5)
)
Now to test:
iter <- 100
res <- replicate(iter, mock_account_breaches())
lagged_res <- c(rep(NA, 1), res[seq_len(iter - 1)])
# get_nanotime records time to the nanosecond
summary((res - lagged_res) / 1E9)
And what I see:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
1.529 1.533 1.533 1.533 1.533 1.538 1
Which suggests to me that most of the time the rate limit is being obeyed in the real account_breaches
function. Has the HIBPwned test been failing regularly, or has it only failed once or twice? If the latter, then I'm pretty sure that the forthcoming update (currently on the devel branch) will solve your issue.
It's only failing recently ... on CRAN and then rerunning the tests on my windows machine results in the same thing
I copied my_check
from the microbenchmark example but I'm not sure it's doing what I think it should be doing
I see. Well your code looks like it ought to work, so in response to your original question, the problem must be ratelimitr. I'll continue working on this.
Ok, after updating my R installation to 3.4.0, I am able to consistently get the same test failure as the one appearing in the CRAN check. The upcoming patch to ratelimitr
(currently in devel) does not resolve the issue. However, based on the HIBPwned API documentation, which says "It's advisable to avoid querying the API at exactly the rate limit as network behaviour may result in some requests arriving within the retry period and causing a 429. Adding a 100 millisecond delay between requests will usually ensure this won't happen," I went and changed the definition of HIBPwned:::get
to use ratelimitr::rate(n = 1, period = 1.6)
(rather than period = 1.5
) and now the tests are passing!
I'll still go ahead and submit the patch based on #13 to CRAN, but for this issue your best bet is to update HIBPwned:::get
to:
get <- ratelimitr::limit_rate(
httr::GET,
ratelimitr::rate(n = 1, period = 1.6)
)
That seems to fix the problem, with or without the forthcoming updates to ratelimitr
.
I upped it and submitted it to CRAN but I think I'm also missing some vital concept about ratelimitr.
I expect that if I set a ratelimit, R will not emit a request until the rate limit period expires e.g. R will wait 1.5s since the last request was completed to send another one. With network latency, that means I usually expect each request to hit approximately 1.51s after the last request.
From what you're saying about the 3.4.0 upgrade it sounds like before then our code worked in that perhaps R added extra computation/processing time so things took that little bit longer and therefore didn't error as often. Perhaps now the rate limit is still being respected but because of the recent R optimisations, it's all executing a little quicker?
Yeah I've been trying to figure out what about 3.4.0 could cause the change, your theory makes sense to me (though unsure how to test it). Here's a rough outline of the way ratelimitr currently works, at least in the case of HIBPwned. Say you use the account_breaches
function:
Sys.sleep
for the appropriate amount of time.What you are suggesting, which I think makes sense, is that it should be like this instead:
That would better respect the fact that time passes between sending the GET request and the HIBPwned server receiving said request. When I've been developing and testing the package, my goal has been to ensure that the request is not emitted more than once per 1.5 seconds, but you've helped me realize the goal should be to make sure that the request is not received more than once per 1.5 seconds.
You can imagine an extreme scenario, where due to a network issue a GET request takes a full 1.5 seconds to reach the HIBPwned server (but the results take .1 seconds to return). By the time the function completes execution, ratelimitr believes that it is free to send another request immediately, but the server won't accept another request from you for another 1.4 seconds. If the next request you send only takes .1 seconds to get there, you've broken the rate limit (from the HIBPwned server's point of view, which is the only one that matters) even though ratelimitr worked as designed.
I'll work on changing the order of when the timestamp is recorded, to work like the second outline above. It seems like a simple change, but will require some reorganizing.
I wrote some unit tests that mimic the "extreme scenario" outlined above, and updated the package to have the rate limiter record timestamps after a function executes. The new version should be much more robust for use with web APIs. I'll be submitting the update to CRAN in the next day or two. Also, with the new updates, the MWE that you wrote above works out to:
min lq mean median uq max neval
1.600398 1.708108 1.719228 1.718625 1.73168 1.92302 100
A test started failing on HIBPwned that looks to be due to instability in the rate limiter. To replicate this issue I ran a microbenchmark evaluating the function 100 times with a warm up of 1 - which should mean all executions in the benchmark get rate limited to 1.5 seconds or greater. The results indicate that at least 25% of the evaluations are not adhering to a 1.5s rate limit:
It could be that I'm not utilising ratelimitr in the best way - could you take a look and determine whether the problem is in my code or ratelimitr?
HIBPwned architecture
account_breaches
Underlying functions
The rate limit is implemented centrally so all user surfaced funcctions are governed by the same limit
MWE