mschubert / clustermq

R package to send function calls as jobs on LSF, SGE, Slurm, PBS/Torque, or each via SSH
https://mschubert.github.io/clustermq/
Apache License 2.0
146 stars 27 forks source link

User-specified worker timeouts #188

Closed wlandau closed 4 years ago

wlandau commented 4 years ago

I have received several reports of drake users encountering worker timeouts, most notably https://github.com/ropensci/drake/issues/1146, https://github.com/ropensci/drake/issues/1148, and https://github.com/ropensci/drake/issues/1150. They all seem related to the default timeout of worker().

https://github.com/mschubert/clustermq/blob/9ba37a3cd17d82e39fda8133a4e1cd36cc76b50d/R/worker.r#L10

What if users could set their own worker timeouts? Is this already possible in the template? If not, I am open to building a clustermq_worker_timeout argument into drake::make(), but it would require analogous functionality in clustermq::workers().

cc @jennysjaarda, @mike-lawrence.

mschubert commented 4 years ago

I see your point.

Timeouts should go away in 0.9 and be replaced by monitoring worker sockets for disconnects (https://github.com/mschubert/clustermq/projects/5).

If that release gets delayed much longer (because of my time constraints) I'll release a 0.8.9 that includes this option.

wlandau commented 4 years ago

Fantastic, thanks!