tarakc02 / ratelimitr

Rate limiting for R functions
Other
40 stars 0 forks source link

Set rate from options() at execution time #15

Closed dpprdan closed 6 years ago

dpprdan commented 6 years ago

I don’t know whether this is an issue / feature request or just a question with an obvious solution. Apologies if it turns out to be the latter, but even then it might still be useful for the vignette?

Anyway, I would like to use ratelimitr in a package, but have the rate set at execution time. I.e. I want the user to be able to (re-)set it easily via options() after the package has been loaded. Is that possible?

The following obviously does not work, because getOption is called at load/build time.

library(ratelimitr)
f <- function() NULL
options(ratelimit = 5)
# rate = 5 set via Options
f_lim <- limit_rate(f, rate(n = getOption("ratelimit"), period = .1))
f_lim
#> A rate limited function, with rates (within 1/60 seconds):
#>      5 calls per 0.1 seconds
#> function() NULL
# reducing ratelimit in options
options(ratelimit = 4)
# but f_lim does not know about it
f_lim
#> A rate limited function, with rates (within 1/60 seconds):
#>      5 calls per 0.1 seconds
#> function() NULL
# function has to be loaded again
f_lim <- limit_rate(f, rate(n = getOption("ratelimit"), period = .1))
f_lim
#> A rate limited function, with rates (within 1/60 seconds):
#>      4 calls per 0.1 seconds
#> function() NULL
tarakc02 commented 6 years ago

Currently, there isn't any way to modify the rate of an existing rate-limited function -- limit_rate always creates a new function with a rate limit that is fixed at the time of creation. I've seen examples where packages use options or environment variables to store credentials, and then the rate is based on whether the user has the appropriate credentials, but as in your example, this happens once when the package is loaded.

If it were possible to update the rate limits for existing functions, what would be the appropriate behavior in the following case:

f_lim <- limit_rate(f, rate(n = 3, period = 1))
replicate(3, f_lim())

# update the rate limit
update_limit(f_lim, rate(n = 5, period = 1))
f_lim()

Should that final call of f_lim() respect the original 3/second limit by waiting for a full second to have elapsed before continuing, or should it move forward as though the rate limit were 5/second and 3 calls have been made so far, or should it reset the internal counter to 0 and move forward with the new rate limit? I'm not fully sure what I can implement, just trying to think through different ideas . . ..

For now, here are some kind of hacky approaches for proceeding:

new_fun <- set_rate(fun, 5)

(that doesn't seem any better than just advising the user to use ratelimitr directly).

fun <- function(data, rate = getOption("ratelimit")) {
  f_lim <- limit_rate(some_fn, rate(n = rate, period = .1))
  lapply(data, f_lim)
}

In that case, the specified rate is obeyed for the duration of the execution of fun. If the user then calls fun again (with the same limit or a new limit), it will not have any memory of the previous execution, so you would still potentially get rate limit errors if you're making calls to a web API, but depending on the use-case it might make sense...

Hmmn. I'll keep thinking on it. Let me know if I've misunderstood what you're trying to do or anything.

dpprdan commented 6 years ago

but as in your example, this happens once when the package is loaded.

Unless I screwed up something in my tests, the rate is even set at package build time and does not get updated when the package is loaded?! So options set in .Rprofile are not used, unless one provides the rate with each function call like you do in the high-level-function example.

The minimum requirement IMO would be to be able to set the rate at package loading time (presumably with .onLoad). Even better, if it would be possible to (re)set the rate with some kind of setup function.

I don't think you misunderstood, but to use a more concrete example nevertheless: With rmapzen::mz_search one can now query geocode.earth and NYC Planning Labs GeoSearch. Geocode.earth offers three packages with 5, 10 and 25+ per second query limits for search, while NYC GeoSearch does not have a rate limit ATM AFAIK. So IMO it would not even make much sense to set the rate at package load time with .onLoad in this case. With the current package setup it would make more sense to set it with mz_set_search_host_geocode.earth and mz_set_search_host_nyc_geosearch() respectively, I think.

Should that final call of f_lim() respect the original 3/second limit by waiting for a full second to have elapsed before continuing, or should it move forward as though the rate limit were 5/second and 3 calls have been made so far, or should it reset the internal counter to 0 and move forward with the new rate limit?

I guess the safest way would be to wait the full second, but I am not sure whether this matters much in practice. I'd guess that the initial setup would be the much more common use case than a reset. (Unless you'd want to use different accounts or use different APIs with the same function (mz_search) in the same session (e.g. compare geocode.earth and NYC GeoSearch or another Pelias Server).

provide the user with a function that returns functions [...] (that doesn't seem any better than just advising the user to use ratelimitr directly).

I think that it would be better, see my example above with mz_set_search_host which hides some of the complexity of setting up the correct API arguments. In addition my current use case is quite similar to {rmapzen}, but with the {opencage} package. There we also use the {memoise} package, so the query basically goes through memmoise::memoise(ratelimitr::limit_rate(GET())). I think it could be challenging for users not familiar with these packages to set this up correctly.

Regarding the "higher level function": If possible, I'd like to avoid another argument to the function (with the query arguments there can already be more than a handful), and again, I don't think that this will be used more than once or maybe twice per session.

tarakc02 commented 6 years ago

Thanks for the details and context. Yes, I think I know the way to get this working -- the way limit_rate works is it creates a bunch of holds (called gatekeepers in the code) that expire after some time, and when you call a rate limited function, it first grabs the oldest hold in the queue, and if the hold is not yet expired, it waits until expiration before executing (and creating a new hold with a new expiration time to replace the one it took out). To update the rate limits for a function, it is only necessary to swap out the gatekeepers in the rate-limited function's environment with a new set reflecting the new rate(s). I was originally hesitant to include functionality in this package that modified functions (that's why right now, all methods return a new function, including reset), but it makes sense that within a package you'd just want the one function that can have different rates set by the user. I'll give this a shot this weekend.

tarakc02 commented 6 years ago

@dpprdan -- the devel branch now includes a function UPDATE_RATE which allows you to take an existing rate limited function and update the rates. I made the function name all caps as a sort of cue that this modifies an existing object rather than returning a new one. anyway -- can you let me know if it does what you need it to?

note: when you run UPDATE_RATE, the old rate limits and related function call history are ignored and forgotten. that means it is possible for the updated function to overrun the old rate that was replaced. For example, say you had previously set a rate limit of 5 per 1 second and you executed the function 5 times without a problem. When you run UPDATE_RATE, the new rate limits go into effect and the history is wiped clean, so you can call the function and it will run without any delay, even though the first call of the function could in theory overrun the previous rate limit that you replaced.

dpprdan commented 6 years ago

Thanks a lot, Tarak! I only had time to test this quickly today, but it's looking very good so far. IMHO the all-caps is not necessary (it clearly states that it is an "update", after all), but that's your call, obviously.

Like I said, I don't think it matters much that the old limits are ignored. If one wants to play it safe a Sys.sleep(1) (or whatever the max(period) is) should suffice, shouldn't it?

I will test more the next couple of days.

dmi3kno commented 6 years ago

I think update_rate should be in the standard arsenal of the package. We need a separate function to modify parameters of an adverb (which ratelimitr is).

I used ratelimitr in my polite package and the way I dealt with updating the rate is that I made rate-limited function a method of the main class object (object of class polite created by bow constructor). Essentially, I am modifying the method and passing along the modified object to other functions, so environment of the function is preserved.

tarakc02 commented 6 years ago

Thanks @dmi3kno! And thanks for sharing the example, I like the approach you take.

For update_rate, I'm happy moving forward with what's on the devel branch. @dpprdan -- have you noticed any problems with that implementation? I can submit the update to CRAN if not.

(edited after the fact to add: @dmi3kno's polite package looks like it is going to be really useful -- nice work on that!)

dpprdan commented 6 years ago

@tarakc02 Sorry for not getting back to you earlier. Everything has been working fine so far. There aren't a lot of edge cases to test, though I guess, so it pretty much works or it does not? And in this case it does. 👍

tarakc02 commented 6 years ago

Thanks, the update should be hitting CRAN in the next couple of days.

dpprdan commented 6 years ago

Version 0.4.0 is on CRAN. grafik

dmi3kno commented 5 years ago

Does it make sense to have a get_rate(lf,...) function that can say if a particular function in the relevant environment has been rate limited (TRUE/FALSE) and if yes, what is the current rate it is bound by?

dpprdan commented 5 years ago

@dmi3kno: This sounds pretty much like what get_rates() does.

dmi3kno commented 5 years ago

Sorry. Just discovered the same. Apologize. Thanks for UPDATE_RATE