pachadotdev / analogsea

Digital Ocean R client
https://pacha.dev/analogsea/
Apache License 2.0
154 stars 24 forks source link

Hitting API rate when creating docklets parallely #108

Closed simecek closed 8 years ago

simecek commented 8 years ago

I use analogsea to start DO machines for course participants (https://github.com/churchill-lab/sysgen2015). To send the same set of instructions to >30 dockets, I am using doParallel/foreach loop, for example this pull "churchill/doqtl" image to all docklets

# pulling docker images
foreach(i = 1:N, .packages="analogsea") %dopar% {

  # select droplet
  d = droplet_list[[i]]

  # pull docker images
  d %>% docklet_pull("rocker/hadleyverse")
  d %>% docklet_pull("churchill/doqtl")
  d %>% docklet_images()
}

The problem is when I tried parallelization of docklet_create:

# starting docklet
droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
  docklet_create(size = getOption("do_size", "8gb"),
                        region = getOption("do_region", "nyc2"))
}

For some reason, the package sent crazy amount of API requests and hit 5000/hour API rate in a few seconds. I filled a ticket on Digital Ocean and got the graph with number of requests per 5 minutes.

cy_dixda3npdlnuti38zfzb3j9ikqgoy_ye4lkrn0mo

When I use for instead of foreach, everything is fine (but slow).

droplet_list <- list()
for (i in 1:N) {
  print(i)
  # start i-th machine
  droplet_list[[i]] <- docklet_create(size = getOption("do_size", "8gb"),
                                      region = getOption("do_region", "nyc2"))
}

I believe it is not an issue of foreach or Digital Ocean but the problem of docklet_create.

sckott commented 8 years ago

Thanks for the report @simecek - I'll have a look

sckott commented 8 years ago

hi again, okay, i made a few small changes, so reinstall from github devtools::install_github("sckott/analogsea")

The wait parameter is the key here. it is by default TRUE - which means we ping the DO API every 1 second to check if the droplet is up or not yet. Once it's up we exit the function call and return the droplet object.

You can set this to FALSE and not do any of those API pings - of course the object returned will be missing the IP address though, but you can manually do your own pinging if you want until its back up, or wait till up, then call droplet(d$id) to renew metadata for the object

I added an option do.wait_time that you can set. It's default is 1 second. So if you still want the wait to occur (pinging every X seconds until the droplet is up), you can do that with whatever time interval you like.

That makes sense that for would take a lot longer than foreach since you had wait=TRUE, so each droplet spin up had to finish before the next could start.

let me know if the changes help.

simecek commented 8 years ago

Hi sckott,

I reinstalled analogsea from Github and set do.wait_time to 30. I got API error later but hit it anyway. I am suspicious that Sys.sleep in action_wait somehow does not work (=runs faster) when processed in parallel (as below)

library(parallel)
library(doParallel)
library("analogsea")

N <- 31
cl <- makeCluster(N)
registerDoParallel(cl)
options(do.wait_time=30)

droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
  docklet_create(size = getOption("do_size", "512mb"),
                 region = getOption("do_region", "nyc2"))
}

However, when I set wait to FALSE everything works fine and as you suggested I used droplet function to get IP later.

Thank you very much for you help. From my perspective the issue was resolved.

sckott commented 8 years ago

@simecek Glad it's resolved.

I am suspicious that Sys.sleep in action_wait somehow does not work (=runs faster) when processed in parallel (as below)

Do you know if when you tried that your rate limit was at its max? I'll test this out and see if the wait time is ignored.

simecek commented 8 years ago

I re-run the code and found the bug. do.wait_time needs to be set inside the foreach loop. With the modified version below, everything works fine and I do not get API error. Thank you once more.

library(parallel)
library(doParallel)
library("analogsea")

N <- 31
cl <- makeCluster(N)
registerDoParallel(cl)

droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
  options(do.wait_time=30)
  docklet_create(size = getOption("do_size", "512mb"),
                 region = getOption("do_region", "nyc2"))
}
sckott commented 8 years ago

Great, glad it worked. I'll make a note in the docs about this so other users don't have to run into the same problem.