Open jakirkham opened 2 years ago
FWIW here's a user who seems to be looking for something like this. Namely each node being managed by one Agent
https://dask.discourse.group/t/slurmcluster-on-64-nodes-understanding-cluster-scale-method/66/4
Yeah this is the kind of use case I was thinking about.
I'm very much conflicted about whether there should be some node level process that sits above the Nanny
and Worker
or whether the Nanny
should be refactored to become a singleton itself and manage many workers.
Either way having a singleton process that maps to nodes would give us more flexibility and the ability to do what you are suggesting.
Something I've been thinking about lately is how different
concurrent.futures.Executor
s could be composed. A fewExecutor
s that might be of interest include the usual suspects as well as some potential new additions. This could also allow users to swap in and out customizations to these. Some things they might include:ProcessPoolExecutor
ThreadPoolExecutor
StreamPoolExecutor
( https://github.com/rapidsai/dask-cuda/issues/641 )One way these might be combined is to create an Agent that owns and occupies a full node, which can then be managed efficiently (leveraging shared memory https://github.com/dask/distributed/issues/4497 for example). This could take some of the burden off of the Scheduler as well by having fewer larger Agents (as opposed to many Worker processes).
Admittedly it's possible this abstraction doesn't fit here, but fits somewhere else. Just something I've been thinking about 🙂