jbusecke / cmip6_derived_cloud_datasets

Prototype for derived cloud data pipeline using CMIP6 data.
1 stars 0 forks source link

Split dask cluster creation from actual execution #12

Open jbusecke opened 2 years ago

jbusecke commented 2 years ago

Currently the dask cluster is started within the main processing logic (wrapper).

I think it would be advantageous to decouple these steps for several reasons:

  1. This might help down the road when we want to use other ways of spinning up a cluster than the coiled API
  2. If the main prefect flow fails, this might be a more robust way of 'catching' the cluster and closing it after both success or failure see #10
  3. This could also help in executing the logic manually (e.g. on one of the pangeo deployments/local machine). One might be able to spin up a cluster and then just provide the needed information to the execution step, without repeatedly opening/closing clusters.

I would need to find a way to pipe the cluster object (or just the address?) into the main processing step