holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.71k stars 404 forks source link

Trimesh support for dask dataframes very different from pandas #4927

Closed jlstevens closed 3 years ago

jlstevens commented 3 years ago

In this example I tried getting the HoloViews example at the end to work (note, that example is wrong, neither of those images are wireframes!). I did this by transforming verts, tris and mesh to their dask versions as verts_ddf, tris_ddf and mesh_ddf using:

from dask import dataframe as dd
verts_ddf = dd.from_pandas(verts, npartitions=4)
tris_ddf = dd.from_pandas(tris, npartitions=4)
mesh_ddf = du.mesh(verts_ddf, tris_ddf).persist()

Now the datashader portion of the notebook works fine using the dask dataframes instead of the original pandas dataframes. In the HoloViews example at the end, this is no longer true. This is what works with pandas (fixing the example to show a wireframe):

import holoviews as hv
import geoviews as gv
from holoviews import opts
from holoviews.operation.datashader import datashade

hv.extension("bokeh")
opts.defaults(
    opts.Image(width=450, height=450),
    opts.RGB(width=450, height=450))

wireframe = datashade(hv.TriMesh((tris,hv.Points(verts, vdims=[])), label="Wireframe"))
trimesh = datashade(hv.TriMesh((tris,hv.Points(verts, vdims='z')), label="TriMesh"), aggregator=ds.mean('z'))
wireframe + trimesh

With this result:

image

But if I just replace tris with tris_ddf etc I get an exception:

NotImplementedError: Dask dataframe does not support assigning non-scalar value.

Here is what does work, using a very different constructor:

wire_nodes=hv.TriMesh((tris_ddf, hv.Nodes(verts_ddf, ['x', 'y', 'index'], [])), label="Wireframe")
trimesh_nodes=hv.TriMesh((tris_ddf, hv.Nodes(verts_ddf, ['x', 'y', 'index'], ['z'], label="TriMesh")))
wireframe = datashade(wire_nodes)
trimesh = datashade(trimesh_nodes, aggregator=ds.mean('z'))
wireframe + trimesh

image

This is unintuitive (and @philippjfr tells that this isn't necessary if an index column is present) and it would be nice if you could transparently replace pandas dataframes with dask ones.

jbednar commented 3 years ago

Here's the code involved, which I tested in this environment:

python=3.6.11
notebook=6.1.5
ipykernel=5.3.4
colorcet=1.0.0
datashader=0.12.1
geoviews=1.9.1
holoviews=1.14.3
pandas=1.1.5
dask=2020.12.0

bay_trimesh.ipynb.gz

jlstevens commented 3 years ago

@philippjfr Given that you understand the details of the network format in HoloViews best, do you have any suggestions for either supporting the same constructor style for dask dataframes as pandas dataframes or at minimum some way of warning users trying to use dask dataframes in the same way as works for pandas?

jbednar commented 3 years ago

Another option may be to change our Pandas examples to whatever works for both Pandas and Dask, if any, and if that way is a plausible recommendation for both.

philippjfr commented 3 years ago

Definite option. It would be more efficient in both the pandas and dask case to use the existing index rather than adding a new column with an integer index.

github-actions[bot] commented 3 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.