Open rsignell-usgs opened 4 years ago
It's failing at https://github.com/holoviz/holoviews/blob/4dcddcc3a8a278dea550147def62f46ecd3d5d1d/holoviews/core/data/dask.py#L272-L277, where holoviews is trying to insert a numpy array into a dask dataframe.
df['index'] = np.arange(n)
but that would fail. We need to wrap the NumPy array in a Dask Array before inserting it. The tricky point is making sure that the chunks of the array matches the partitions of the dataframe.
Here's the basic idea
In [1]: import dask.dataframe as dd
In [2]: import dask
In [4]: import dask.array as da
In [5]: import numpy as np
In [6]: ts = dask.datasets.timeseries(end='2000-01-05')
In [7]: ts
Out[7]:
Dask DataFrame Structure:
id name x y
npartitions=4
2000-01-01 int64 object float64 float64
2000-01-02 ... ... ... ...
2000-01-03 ... ... ... ...
2000-01-04 ... ... ... ...
2000-01-05 ... ... ... ...
Dask Name: make-timeseries, 4 tasks
In [8]: index = np.arange(len(ts))
In [9]: chunks = tuple(ts.map_partitions(len).compute())
In [10]: ts['index'] = da.from_array(index, chunks)
In [11]: ts
Out[11]:
Dask DataFrame Structure:
id name x y index
npartitions=4
2000-01-01 int64 object float64 float64 int64
2000-01-02 ... ... ... ... ...
2000-01-03 ... ... ... ... ...
2000-01-04 ... ... ... ... ...
2000-01-05 ... ... ... ... ...
Dask Name: assign, 21 tasks
The unfortunate part is the chunks = tuple(ts.map_partitions(len).compute())
. I think that's unavoidable though, since we need the chunks to align... Hopefully that's not too expensive (or maybe holoviews already gets the chunk sizes elsewhere?)
I see that trimesh has support for dask from the table in the datashader performance page: and there also an datashader trimesh notebook with a dask example
It would be great to add a similar dask example after the last cell in: https://github.com/pyviz-topics/examples/blob/master/bay_trimesh/bay_trimesh.ipynb to basically dask-enable this call:
When I try it, I get:
My full notebook is here:
https://nbviewer.jupyter.org/gist/rsignell-usgs/cf853d43fd5e53ba90fbd4e0b9eb3da7