Closed pnuu closed 4 years ago
Would it be possible to use dask's builtin configuration files for this? I think with those you can set the default scheduler for the running system.
For the user it would be more straightforward to only have the one configuration file to edit.
I don't see any config options in Dask documentation to set the scheduler address in the configuration file. I tested by starting a scheduler ($ dask-scheduler
) and a worker ($ dask-worker
) on other terminals, then inspecting dask.config.config
before and after creating a Client()
connected to the scheduler. The scheduler were set to dask.distributed
, but there is no information about the scheduler where the client
connected.
So based on this, I think we need the client
to be created - or not created - based on values set in the trollflow2 configuration file.
Yeah you might be right on the address. Here's the template that dask.distributed uses (there's one for regular dask too):
https://github.com/dask/distributed/blob/master/distributed/distributed.yaml
Lots of options but maybe not a "this exact scheduler".
As for the options for this, It'd be nice if you allowed for the user to specify the name of the cluster class. Either as a YAML object path to the dask class like we do for satpy readers/composites or as a name that gets imported directly from dask.distributed
(less flexible but probably works in most cases). That way you could let users create their own Slurm or Kubernetes or other cluster. There's also Dask Gateway that might be used in the long run for some organizations. Just brainstorming.
Feature Request
Is your feature request related to a problem? Please describe. Trollflow2 should eventually support
dask.distributed
for parallellism. Both local cluster and external cluster should be supported, but also the current default scheduler should be a possibility.Describe the solution you'd like When suitable configuration is given, use
dask.distributed.Client()
to either connect to a external scheduler, or create a local cluster of workers. Ifclient
config isn't given, the current behaviour of default scheduler should be used.The configuration could be something like this:
Describe any changes to existing user workflow Ideally this shouldn't affect the default usage.
Additional context Some things that might affect the implementation and/or use:
dask.distributed