judgelord / regulationsdotgov

a package to get data from regulations.gov
https://judgelord.github.io/regulationsdotgov/
MIT License
0 stars 0 forks source link

option to use multiple API keys #4

Open judgelord opened 5 months ago

judgelord commented 5 months ago

some users may have higher rate limits: https://www.regulations.gov/faq

delays should default to the default rate limit with local and global options to change it.

we may also want to provide an option to provide multiple keys to cycle through, in wich case the delay will be divided by the number of keys provided

judgelord commented 2 months ago

Since functions now take rate limits into account automatically, we no longer need an option to specify rate limits, but it would still speed things up to be able to cycle through keys.

judgelord commented 2 months ago

get_searchTerm_batch.R implements this with an argument "api_keys" replacing "api_key" and adding the following to the

# EXTRACT THE MOST RECENT x-ratelimit-remaining and pause if it is less than 20 (since we call 20 pages at a time)
  remaining <<-  map(result, headers) |>
    tail(1) |>
    pluck(1, "x-ratelimit-remaining") |>
    as.numeric()

  if(remaining < 20){

    message(paste(Sys.time()|> format("%X"), "- Hit rate limit, will continue after one minute"))

    # ROTATE KEYS
      api_keys <<- c(tail(api_keys, -1), head(api_keys, 1))
      api_key <- api_keys[1]
      message(paste("Rotating to api key", api_key))

    Sys.sleep(60)
  }
judgelord commented 2 months ago

FYI, @mzkhuzam2 -- I made breaking changes to get_comment_details. It now requires an api_keys argument. It should be able to use an api_keys object in the environment as the default, but that was not working for me for some reason. Either way, an "api_key" object in the environment is no longer sufficient, though perhaps we should make that the default instead.

judgelord commented 1 month ago

Currently, when a rate limit is hit, we rotate keys AND pause for one minute.

A better approach would be to rotate keys without pausing and then, if it rotates through all of them in less than an hour, pause. This could be done by storing the last key and a timestamp at the very beginning and then whenever the last key is hit, it pauses and resets the timestamp.

At the start:

time <<- Sys.time() 
last_key <<- tail(keys, 1) 

Inside the batch

if(sys.time() - time < 1 hour & api_key = last_key * remaining == 0){
message("Tried all api keys, pausing for one minute" )
sys.sleep(60)
time <<- Sys.time() 
} 
judgelord commented 5 days ago

in get_comment_details, keys either failing to rotate or failing to rotate in the message, and the message being triggered is behaving oddly, with the two messages saying different things. It seems that the "trying again" message is correct since these date did come through:

| EPA-HQ-OAR-2021-0317-0188 | status: 429 | limit-remaining 0 |
429 - rotating api key to 2
| Trying EPA-HQ-OAR-2021-0317-0188 again | now 200 | 672 remaining |
| EPA-HQ-OAR-2021-0317-0187 | status: 429 | limit-remaining 0 |
429 - rotating api key to 2
| Trying EPA-HQ-OAR-2021-0317-0187 again | now 200 | 671 remaining |
| EPA-HQ-OAR-2021-0317-0184 | status: 429 | limit-remaining 0 |
429 - rotating api key to 2
| Trying EPA-HQ-OAR-2021-0317-0184 again | now 200 | 670 remaining |