jacobtomlinson / dask-agent

A process agent for Dask to provide flexibility and control over starting workers/nannies
1 stars 0 forks source link

Dask Agent

Dask Agent is a drop in replacement for dask-worker, dask-spec and dask-cuda-worker which aims to provide more flexibility and control to power users over starting worker processes.


Problems it solves:




Start a scheduler

.. code-block:: console

$ dask-scheduler

Start an agent process pointing to the scheduler

.. code-block:: console

$ dask-agent tcp://<ip>:8786
distributed.agent - INFO - Starting Dask Agent
distributed.agent - INFO - Agent at:  tcp://
distributed.agent - INFO - Provisioning nodes with mode 'auto-cpu'
distributed.agent - INFO - Found 12 CPU cores, provisioning 4 processes with 3 threads each
distributed.nanny - INFO -         Start Nanny at: 'tcp://'
distributed.nanny - INFO -         Start Nanny at: 'tcp://'
distributed.nanny - INFO -         Start Nanny at: 'tcp://'
distributed.nanny - INFO -         Start Nanny at: 'tcp://'

Connect a client to the scheduler and change the provisioning mode

.. code-block:: python

In [1]: from dask.distributed import Client

In [2]: client = await Client("tcp://localhost:8786", asynchronous=True)

In [3]: await client.scheduler.reprovision_nodes(mode="auto-gpu")

See the agent closing CPU workers and starting GPU workers

.. code-block:: console

distributed.agent - INFO - Closing all subprocesses
distributed.nanny - INFO - Closing Nanny at 'tcp://'
distributed.nanny - INFO - Closing Nanny at 'tcp://'
distributed.nanny - INFO - Closing Nanny at 'tcp://'
distributed.worker - INFO - Stopping worker at tcp://
distributed.nanny - INFO - Closing Nanny at 'tcp://'
distributed.worker - INFO - Stopping worker at tcp://
distributed.worker - INFO - Stopping worker at tcp://
distributed.worker - INFO - Stopping worker at tcp://
distributed.agent - INFO - Provisioning nodes with mode 'auto-gpu'
distributed.agent - ERROR - Cannot provision GPU workers, unable to find dask_cuda


Spec ^^^^

Reprovisioning each agent to run a worker from a spec.

.. code-block:: python

await client.scheduler.reprovision_nodes(
        "spec": {
            "cls": "dask.distributed.Nanny",
            "opts": {},