jacobtomlinson / dask-agent

A process agent for Dask to provide flexibility and control over starting workers/nannies
1 stars 0 forks source link

Experiment with environment syncing using conda-pack #2

Open jacobtomlinson opened 1 year ago

jacobtomlinson commented 1 year ago

Quick example using Dask Kubernetes

In [1]: from dask_kubernetes.operator import KubeCluster
   ...: from dask.distributed import Client
   ...: cluster = KubeCluster(name="agent", n_workers=3, worker_command="dask-agent", env={"EXTRA_PIP_PACKAGES": "git+https://github.com/jacobtomlinson/dask-agent.git@environment-sync"})
   ...: client = await Client(cluster.scheduler_address, asynchronous=True)
+-------------+----------------+----------------+----------------+
| Package     | Client         | Scheduler      | Workers        |
+-------------+----------------+----------------+----------------+
| cloudpickle | 2.0.0          | 2.2.0          | 2.2.0          |
| lz4         | 3.1.3          | 4.0.2          | 4.0.2          |
| msgpack     | 1.0.3          | 1.0.4          | 1.0.4          |
| numpy       | 1.23.3         | 1.23.4         | 1.23.4         |
| pandas      | 1.4.4          | 1.5.1          | 1.5.1          |
| python      | 3.10.6.final.0 | 3.8.13.final.0 | 3.8.13.final.0 |
+-------------+----------------+----------------+----------------+

Note that the Python versions between the client and workers are mismatched.

In [2]: from dask_agent import sync_env
   ...: await sync_env(client)
Packaging env into /tmp/tmpww5mppg4.tgz...
Packaged
Uploading env...
Uploaded
Reprovisioning nodes...
Reprovisioned
+-------------+----------------+----------------+----------------+
| Package     | Client         | Scheduler      | Workers        |
+-------------+----------------+----------------+----------------+
| cloudpickle | 2.0.0          | 2.2.0          | 2.0.0          |
| dask        | 2022.10.2      | 2022.10.2      | 2022.7.0       |
| distributed | 2022.10.2      | 2022.10.2      | 2022.7.0       |
| lz4         | 3.1.3          | 4.0.2          | 3.1.3          |
| msgpack     | 1.0.3          | 1.0.4          | 1.0.3          |
| python      | 3.10.6.final.0 | 3.8.13.final.0 | 3.10.6.final.0 |
+-------------+----------------+----------------+----------------+

Note that the Python versions are now in sync (but weirdly the dask versions are now out of sync).

Problems