send a target to a second worker in clustermq parallelism

ropensci / drake

An R-focused pipeline toolkit for reproducibility and high-performance computing

GNU General Public License v3.0

1.34k stars 128 forks source link

Prework

[x] Read and abide by drake's code of conduct.
[x] Search for duplicates among the existing issues, both open and closed.

Proposal

I found a case where a dynamic target got really close to finishing but did not while I still had workers up and waiting for work. What I suspect happened was that targets were allocated to workers that then disappeared due to the HPC time limit. What I would have liked to have happened was that drake would recognise that the worker has disappeared then send the target to another worker that is still around.

I believe this would require clustermq to be able to say which workers have disappeared via SLURM in my case.

ropensci / drake

send a target to a second worker in clustermq parallelism #1287

Prework

Proposal