Closed kendonB closed 6 years ago
Interesting. It is certainly possible, but maybe a niceness level in the system2()
call would be easier. I will consider a backoff function too, but I am concerned that it might slow down workflows with quick targets.
Highly doubtful it would slow anything down if the initial value is set low enough. The first value could be set well lower than 0.1, I believe. Of course, note that an exponential backoff resets when something changes.
Do you know of similar tools that use an exponential backoff like this? The idea is new to me, and it would be nice to see other places where it plays out.
I believe it's in batchtools somewhere - can look when I'm back at my computer
On Fri, Oct 12, 2018, 7:40 PM Will Landau notifications@github.com wrote:
Do you know of similar tools that use an exponential backoff like this? The idea is new to me, and it would be nice to see other places where it plays out.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ropensci/drake/issues/537#issuecomment-429220933, or mute the thread https://github.com/notifications/unsubscribe-auth/AFFKkVKv0ekn_zWiWJjgeo9DAe9HVKyuks5ukDlqgaJpZM4XPBTH .
Thanks.
@mschubert, what would you recommend?
Aha, I see it at https://github.com/mllg/batchtools/blob/3b0b1a9a59e377bb4d827e355d6955d66849c9e6/R/sleep.R#L4. I will probably end up adding an optional sleep
argument to make()
, but with the default still set to function(i){0.01}
. Unlike batchtools
, drake
needs to accommodate local lightweight parallelism too, and I think the default options should accommodate small workflows with low overhead. In any case, the default is easy to change later.
Yes, every tool that needs to continuously check if new data is available instead of being notified when that is the case will incur a certain CPU cost based on the check interval.
I'm not sure I fully understand the issue, but I guess that's what's going on here.
Backing off the interval based on a low frequency of positive checks makes sense here if a passive notification of result availability is not possible.
However, if drake
processes long calls first and short calls later, this should be handled as well (i.e., enabling the check interval to become both longer with no results and shorter with results)
Thanks! I will continue thinking about backing off the interval as a default. In the meantime, I will merge #545 after the builds complete so users can insert their own backoff functions.
As far as I know, Shiny's reactivity model uses a passive notification system based on callback functions. If I were to rearchitect drake
from scratch, I would try to use something like this. Not only would it cut down unnecessary sleeping while minimizing the CPU load, it might also do away with the need to construct the entire dependency graph. I think this is the crux of drake
's overhead issues for ~10000+ targets and the dynamic dependency relationships required for #233 and #304.
it might also do away with the need to construct the entire dependency graph
I'm not sure how this would be possible?
FWIW, if you use clustermq
as a backend, polling the result socket should not incur any significant CPU cost if no results arrive (and in this case you wouldn't want a delay by not polling - it doesn't have any advantage except limiting result processing if they arrive very quickly)
I am not sure if it's possible either, but I think it deserves some thought. With a passive notification model, drake
could start with targets with no dependencies. When those targets finish, they could broadcast to the rest of the targets, and that could trigger targets that no longer have anything holding them back.
I did not think clustermq
would throttle the CPU, and I am glad you confirmed this. Is this taken care of in w$receive_data()
? If not, is there anything you think I should change in drake
's cmd_master()
function?
Fixed via #545.
w$receive_data()
will block until a result arrives (at negligible cpu cost), so that's all fine; no need to do anything extra
I am using
future_lapply
parallelism and see that fl_master is consuming CPU resources.Ultimately I think this ends up in
mc_master
in a while loop that's checking every 0.1 seconds (by default).Is it possible to have the default be an exponential backoff going from 0.1 seconds to 2 minutes if nothing happens in the loop?