Closed simecek closed 8 years ago
Thanks for the report @simecek - I'll have a look
hi again, okay, i made a few small changes, so reinstall from github devtools::install_github("sckott/analogsea")
The wait
parameter is the key here. it is by default TRUE
- which means we ping the DO API every 1 second to check if the droplet is up or not yet. Once it's up we exit the function call and return the droplet object.
You can set this to FALSE
and not do any of those API pings - of course the object returned will be missing the IP address though, but you can manually do your own pinging if you want until its back up, or wait till up, then call droplet(d$id)
to renew metadata for the object
I added an option do.wait_time
that you can set. It's default is 1 second. So if you still want the wait to occur (pinging every X seconds until the droplet is up), you can do that with whatever time interval you like.
That makes sense that for
would take a lot longer than foreach
since you had wait=TRUE
, so each droplet spin up had to finish before the next could start.
let me know if the changes help.
Hi sckott,
I reinstalled analogsea from Github and set do.wait_time
to 30. I got API error later but hit it anyway. I am suspicious that Sys.sleep
in action_wait
somehow does not work (=runs faster) when processed in parallel (as below)
library(parallel)
library(doParallel)
library("analogsea")
N <- 31
cl <- makeCluster(N)
registerDoParallel(cl)
options(do.wait_time=30)
droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
docklet_create(size = getOption("do_size", "512mb"),
region = getOption("do_region", "nyc2"))
}
However, when I set wait
to FALSE everything works fine and as you suggested I used droplet
function to get IP later.
Thank you very much for you help. From my perspective the issue was resolved.
@simecek Glad it's resolved.
I am suspicious that Sys.sleep in action_wait somehow does not work (=runs faster) when processed in parallel (as below)
Do you know if when you tried that your rate limit was at its max? I'll test this out and see if the wait time is ignored.
I re-run the code and found the bug. do.wait_time
needs to be set inside the foreach loop. With the modified version below, everything works fine and I do not get API error. Thank you once more.
library(parallel)
library(doParallel)
library("analogsea")
N <- 31
cl <- makeCluster(N)
registerDoParallel(cl)
droplet_list <- foreach(i = 1:N, .packages="analogsea") %dopar% {
options(do.wait_time=30)
docklet_create(size = getOption("do_size", "512mb"),
region = getOption("do_region", "nyc2"))
}
Great, glad it worked. I'll make a note in the docs about this so other users don't have to run into the same problem.
I use analogsea to start DO machines for course participants (https://github.com/churchill-lab/sysgen2015). To send the same set of instructions to >30 dockets, I am using doParallel/foreach loop, for example this pull "churchill/doqtl" image to all docklets
The problem is when I tried parallelization of
docklet_create
:For some reason, the package sent crazy amount of API requests and hit 5000/hour API rate in a few seconds. I filled a ticket on Digital Ocean and got the graph with number of requests per 5 minutes.
When I use
for
instead offoreach
, everything is fine (but slow).I believe it is not an issue of foreach or Digital Ocean but the problem of
docklet_create
.