xpublish-community / xpublish

Publish Xarray Datasets via a REST API.
https://xpublish.readthedocs.io
Apache License 2.0
168 stars 23 forks source link

Experimenting with Dask integration #208

Open abkfenris opened 1 year ago

abkfenris commented 1 year ago

After the Dask discussion two weeks ago (see https://github.com/orgs/xpublish-community/discussions/4) I sat down and sketched out what an implementation could look like in Xpublish. It's really rough, and throughly un-tested.

This adds two local plugins and associated infrastructure for most hooks to be able to use Dask.

In most cases for different types of Dask infrastruture, a plugin that provides a get_dask_cluster() method should do the trick. The hook is set up to only return one result, and the built in plugin will be the last.

The Dask client plugin in theory should work with different types of clusters, but is similarly set up to be able to be overridden (dask-on-ray?). The client can be both sync and async, and once it gets accessed, it's cached on xpublish.Rest.

For hooks that have access to deps (which now includes dataset providers), deps.dask_sync_client and deps.dask_async_client now should give you the client.

The async client may need to be passed the current event loop. It appears the way to access the event loop varies by server, so that will probably take some research.

abkfenris commented 1 year ago

Some of the many tabs I had open while pondering Dask, Xarray, async, and FastAPI, and resources from others: