dynspread/spread not working for datashaded plots with aggregator=ds.by(column, ds.any())

Noskario commented 2 years ago

software version info

numpy 1.20.3 , pandas 1.3.3 , bokeh 2.3.3 , holoviews 1.14.6 , datashader 0.13.0 ,

Description of expected behavior and the observed behavior

I want to make a scatterplot (with color for categories) where there are some sparse points that I still want to see even if there are other regions with much higher density. I think ds.any() should be the way to go in this case. Unfortunately, when I use dynspread on this plot, the points disappear and I the whole plot that datashader produces gets a strange background-color. (Interestingly this color is not always the same...)

Have a look at the following example:

Complete, minimal, self-contained example code that reproduces the issue

import numpy as np
import pandas as pd
import holoviews as hv
hv.extension('bokeh')
import datashader as ds
from datashader.colors import Sets1to3
from holoviews.operation.datashader import datashade,dynspread,spread

raw_data = [('Alice', 60, 'London', 5) ,
           ('Bob', 14, 'Delhi' , 7) ,
           ('Charlie', 66, np.NaN, 11) ,
           ('Dave', np.NaN,'Delhi' , 15) ,
           ('Eveline', 33, 'Delhi' , 4) ,
           ('Fred', 32, 'New York', np.NaN ),
           ('George', 95, 'Paris', 11)
            ]
# Create a DataFrame object
df = pd.DataFrame(raw_data, columns=['Name', 'Age', 'City', 'Experience'])
df['City']=pd.Categorical(df['City'])

x='Age'
y='Experience'
color='City'
cats=df[color].cat.categories

# Make dummy-points (currently the only way to make a legend: https://holoviews.org/user_guide/Large_Data.html)
color_key=[(name,color) for name, color in zip(cats,Sets1to3)]
color_points = hv.NdOverlay({n: hv.Points([0,0], label=str(n)).opts(color=c,size=0) for n,c in color_key})

# Create the plot with datashader
points=hv.Points(df, [x, y],label="%s vs %s" % (x, y),)#.redim.range(Age=(0,90), Experience=(0,14))
datashaded1=datashade(points,aggregator=ds.by(color)).opts(width=550, height=480)
datashaded2=datashade(points,aggregator=ds.by(color,ds.any())).opts(width=550, height=480)

dynspread(datashaded1)*color_points+dynspread(datashaded2)*color_points
# spread(datashade(points,aggregator=ds.by(color,ds.any())).opts(width=550, height=480))*color_points

We get the following result:

example_for_comparing

jbednar commented 2 years ago

Thanks for filing the issue! I can verify that I get the same result, but don't yet know what the cause is.

jbednar commented 2 years ago

I'm still not quite sure why this behavior happens, but the upcoming Datashader release, currently installable as a dev version using conda install -c pyviz/label/dev, should have two new features relevant to it:

shade has a new option rescale_discrete_values, which the next HoloViews release should allow you to specify in datashade (and which is likely to become the default), so that isolated pixels are fully visible when you zoom into their general area. This might address why you were using any.
The new release includes floating-point versions of any and count, and I've verified that using ds.any_f32 doesn't suffer from this problem on your example. I'm not sure why the datatype would matter here, and that's a clue for tracking this down, but meanwhile just use any_f32 instead of any.

Also, instead of any version of any (no pun intended), you can consider using min_alpha=128 (or similar), which should ensure that even isolated points are fully visible, at the expense of a lower dynamic range to indicate point density.

holoviz / datashader