holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.3k stars 365 forks source link

Datashader categorical plot not rendering legend with cuDF #1178

Closed orozcojd closed 1 year ago

orozcojd commented 1 year ago

ALL software version info

datashader=0.14.4 cudf=22.12.01 bokeh=2.4.3 panel=0.14.3 pandas=1.5.3 numpy=1.21.5 holoviews=1.15.4

Description of expected behavior and the observed behavior

After migrating Datashader code to use cudf from pandas.DataFrames, the legend for my categorical plot is no longer showing.

A working example could be found following [this documentation](conda update -n base -c conda-forge conda) and replacing pd.DataFrame with cudf.DataFrame, or refer to below snippet.

Complete, minimal, self-contained example code that reproduces the issue

df = pd.concat(dfs, ignore_index=True)
df.parameter = df.parameter.astype("category")
cuda_df = cudf.DataFrame.from_pandas(df)

curve = hv.Curve(cuda_df).opts(**options) # aggregator=ds.by("parameter", ds.count())
shader = datashade(curve, aggregator=ds.count_cat("parameter"), cnorm="log", color_key=cc.glasbey).opts(**options)
ianthomas23 commented 1 year ago

Thanks for the bug report, I can reproduce it. Simple reproducer:

import datashader as ds
import holoviews as hv
from holoviews.operation.datashader import datashade
import pandas as pd
hv.extension('bokeh')

df = pd.DataFrame(dict(
    x    = [0.0, 1.0, 0.0, 1.0, 0.0],
    y    = [0.0, 1.0, 1.0, 0.0, 0.5],
    cat  = ['a', 'b', 'a', 'b', 'a'],
))
df.cat = df.cat.astype("category")

if 1:
    import cudf
    df = cudf.DataFrame.from_pandas(df)

curve = hv.Curve(df)
datashade(curve, aggregator=ds.by("cat"))

This works fine without the contents of the if-statement, and with it gives:

<snip>
File ~/.miniconda/envs/rapids/lib/python3.9/site-packages/holoviews/operation/datashader.py:1423, in shade._process(self, element, key)
   1421         return RGB(img, **params)
   1422     else:
-> 1423         img = tf.shade(array, **shade_opts)
   1424 return RGB(self.uint32_to_uint8_xr(img), **params)

File ~/github/datashader/datashader/transfer_functions/__init__.py:701, in shade(agg, cmap, color_key, how, alpha, min_alpha, span, name, color_baseline, rescale_discrete_levels)
    699         return _interpolate(agg, cmap, how, alpha, span, min_alpha, name, rescale_discrete_levels)
    700 elif agg.ndim == 3:
--> 701     return _colorize(agg, color_key, how, alpha, span, min_alpha, name, color_baseline, rescale_discrete_levels)
    702 else:
    703     raise ValueError("agg must use 2D or 3D coordinates")

File ~/github/datashader/datashader/transfer_functions/__init__.py:416, in _colorize(agg, color_key, how, alpha, span, min_alpha, name, color_baseline, rescale_discrete_levels)
    414 total = nansum_missing(data, axis=2)
    415 mask = np.isnan(total)
--> 416 a = _interpolate_alpha(data, total, mask, how, alpha, span, min_alpha, rescale_discrete_levels)
    418 values = np.dstack([r, g, b, a]).view(np.uint32).reshape(a.shape)
    419 if cupy and isinstance(values, cupy.ndarray):
    420     # Convert cupy array to numpy for final image

File ~/github/datashader/datashader/transfer_functions/__init__.py:490, in _interpolate_alpha(data, total, mask, how, alpha, span, min_alpha, rescale_discrete_levels)
    487         norm_span = norm_span[0]  # Ignore discrete_levels
    489 # Interpolate the alpha values
--> 490 a = interp(a_scaled, array(norm_span), array([min_alpha, alpha]),
    491            left=0, right=255).astype(np.uint8)
    492 return a

File ~/.miniconda/envs/rapids/lib/python3.9/site-packages/cupy/_creation/from_data.py:46, in array(obj, dtype, copy, order, subok, ndmin)
      7 def array(obj, dtype=None, copy=True, order='K', subok=False, ndmin=0):
      8     """Creates an array on the current device.
      9 
     10     This function currently does not support the ``subok`` option.
   (...)
     44 
     45     """
---> 46     return _core.array(obj, dtype, copy, order, subok, ndmin)

File cupy/_core/core.pyx:2357, in cupy._core.core.array()

File cupy/_core/core.pyx:2381, in cupy._core.core.array()

File cupy/_core/core.pyx:2506, in cupy._core.core._array_default()

File cupy/_core/core.pyx:1473, in cupy._core.core._ndarray_base.__array__()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy 

So we need to be more explicit about conversion from cupy to numpy arrays in shade() and related functions.

ianthomas23 commented 1 year ago

Minimal reproducer using datashader without holoviews:

import cudf
import datashader as ds
import pandas as pd

df = pd.DataFrame(dict(
    x    = [0.0, 1.0, 0.0, 1.0, 0.0],
    y    = [0.0, 1.0, 1.0, 0.0, 0.5],
    cat  = ['a', 'b', 'a', 'b', 'a'],
))
df.cat = df.cat.astype("category")
df = cudf.DataFrame.from_pandas(df)

canvas = ds.Canvas(3, 4)
agg = canvas.points(df, 'x', 'y', agg=ds.by("cat"))
im = ds.transfer_functions.shade(agg, how='eq_hist', rescale_discrete_levels=True)

It needs all four of cudf, categorical aggregate, how='eq_hist' and rescale_discrete_levels=True to reproduce.

Underlying problem is in transfer_functions._rescale_discrete_levels function.

ianthomas23 commented 1 year ago

The Datashader side of this is fixed by #1179 so that no error occurs. The HoloViews issue about the legend not being displayed is holoviz/holoviews#5619.

ianthomas23 commented 1 year ago

The holoviews side of this problem is fixed by holoviz/holoviews#5631 so I am closing this issue as completed.