ratt-ru / shadeMS

Rapid Measurement Set plotting with dask-ms and datashader
20 stars 6 forks source link

Implement cudf backend in dask_utils.dataframe_factory #107

Open sjperkins opened 1 year ago

sjperkins commented 1 year ago

In various places throughout shade_ms/dask_utils.py we assume an underlying pandas Dataframe when creating a dask Dataframe.

https://github.com/ratt-ru/shadeMS/blob/4026ff44e5831c3369905a80205e8b08035ccfbe/shade_ms/dask_utils.py#L22-L26

https://github.com/ratt-ru/shadeMS/blob/4026ff44e5831c3369905a80205e8b08035ccfbe/shade_ms/dask_utils.py#L138-L143

It is now possible to specify a Dataframe backend when creating dask Dataframes https://medium.com/rapids-ai/easy-cpu-gpu-arrays-and-dataframes-run-your-dask-code-where-youd-like-e349d92351d?s=03.

>>> with dask.config.set({"dataframe.backend": "cudf"}):
…    data = {"a": range(10), "b": range(10)}
…    ddf = dd.from_dict(data, npartitions=2)
…    
>>> ddf
<dask_cudf.DataFrame | 2 tasks | 2 npartitions>

shadems should respect this option. The following code might be one way of doing so in dask_utils.py:

from importlib import import_module

backend = dask.config.get("dataframe.backend")
dataframe = backend.Dataframe(...)