tarakc02 / ratelimitr

Rate limiting for R functions
Other
40 stars 0 forks source link

rate limited functions still run too fast at times #7

Closed tarakc02 closed 7 years ago

tarakc02 commented 7 years ago

Noticed that unit tests fail once every 5-10 tries. Could be a rounding issue?

stephlocke commented 7 years ago

I think this is partly due to some floating points in divisions, but also likely to come from variable precision in Sys.time() across OS. It might be worth checking out the rcpp for microbenchmark or possibly depending on it and making use of it where possible.

tarakc02 commented 7 years ago

Thanks for the advice! Yeah, I started doing rounding to the 60th of a second based on the following from ?proc.time: "The resolution of the times will be system-specific and on Unix-alikes times are rounded down to milliseconds. On modern systems they will be that accurate, but on older systems they might be accurate to 1/100 or 1/60 sec. They are typically available to 10ms on Windows." -- But note that this is a user-controlled parameter (the precision argument to limit_rate.

I'm just noticing that I do not take the ceiling when calculating the time_to_wait, so the differences when the tests fail (most recent was off by 2.78e-17) might just have to do with not waiting quite long enough. I'm updating the unit tests to do more thorough and repeated testing (since I can only get the error to show up every handful of tries). Once I get to a point where the test is failing consistently, I'll change the calculation of time_to_wait to see if that helps.

I like your idea of using microbenchmark or something related as a more precise timer. I'm hoping for simplicity's sake that I can avoid any additional dependencies, but will definitely explore all options . . ..

tarakc02 commented 7 years ago

It's not elegant, but the most recent update should at least not break the set speed limits (it might wait a tad longer than necessary though). I've added a test that runs a rate limited function repeatedly and checks that it never overruns the limit (the original tests just ran the limited function a few times, hence the occasional but not consistent failures). Before the updates, the test failed every time I ran it. Now it passes. The "trick" for now was adding a buffer (of one precision-unit, by default that's 1/60th of a second) to the time at which a previous call's "hold" on the queue expires.

For the use-cases I'm imagining (calls to a web API), I'm hoping that the potential performance hit won't be too much of a problem (in my own examples, I created a function with a limit of 10 calls per .1 seconds, then repeatedly checked the time it took to use the function 11 times. The times were between .14 and .16 seconds. When the limit was instead 10 calls per 1 second, the actual measured times were between 1.04 and 1.06 seconds). Still, I'm keeping this issue open as I learn more, and will be researching how microbenchmark, for instance, gets such precise timing.

stephlocke commented 7 years ago

Hey @tarakc02, how's it going on this? Having success? This might be handy: https://twitter.com/eddelbuettel/status/809739499785830400

tarakc02 commented 7 years ago

@stephlocke: Yes, thanks for checking in! The current version passes all tests and does not use a "buffer" variable to wait an additional (unexplained) amount of time. I had not seen the nanotime package when I was working on this, but have been using microbenchmark::get_nanotime (not quite as ambitious, but similar in spirit I suppose). After some confusion/consternation about the failing tests, I realized that the problem might be in the test code itself (hinted at by the fact that the errors were in the 1E-13 to 1E-17 seconds range) -- I replaced the calls to system.time() in the tests with a timer based on microbenchmark::get_nanotime.

I've successfully run longer versions of the unit tests (testing thousands of calls rather than dozens/hundreds), and successfully used the rmapzen package, which depends on ratelimitr, to geocode around 25,000 addresses without hitting a rate limit error in the process.

Next steps: write up a short vignette and submit to CRAN (R CMD check is already succeeding)