holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.31k stars 366 forks source link

create_ramp_legend does not include the color for the highest density #277

Open StevenCHowell opened 7 years ago

StevenCHowell commented 7 years ago

This may be related to issue #276. In testing different palettes in the legend example notebook, it seems the legend never includes the highest concentration of data points.

I tried a palette with only two colors, palette = ['white', 'navy'], made a few changes in the create_base_plot definition

    # img = tf.shade(agg, cmap=Hot, how='log')
    img = tf.shade(agg, cmap=palette, how='eq_hist')
    ...
    fig.background_fill_color = 'white'  # 'black'

then got the following result. image There is an obvious difference between the legend and figure and the legend. It is surprising that navy is not included in any of the legends, 'eq_hist', 'log', or 'linear'; actually these three different legends exhibit only minor differences.

Repeating this with the viridis 256 colormap, from bokeh.palettes import Viridis256 as palette, gives a similarly surprising result, with no yellow in any of the legends. image

jbednar commented 7 years ago

I can't tell from your code above whether the legends are working correctly, but I agree that they don't appear to be. Unfortunately, I don't think the person who wrote the legend examples has time to look at it now, and it will be a few weeks before the next person who is going to look at that will start working, so you happen to have discovered this issue in between when we can do anything about it. If you aren't able to solve it soon, please check back with me around the end of Feb when I can hopefully put someone to work on legend and hover support to make them work really solidly.

StevenCHowell commented 7 years ago

How well this works seems to depend on the number of counts. I tried it with a different data set and the result looked fine.

brendancol commented 7 years ago

@jbednar @StevenCHowell

Sounds like a bug to me.

I'm not sure if this fully explains it, but create_ramp_legend only supports linear and log as arguments for how. It would be great if this also changed. The spot in the code to look at is: https://github.com/bokeh/datashader/blob/master/datashader/utils.py#L101-L107

I'm marking this a bug and assigning to myself.

StevenCHowell commented 7 years ago

Any change to this issue?

Here are some additional legends that seem strange. Note that for the aggregation method, I used how=log (though I would prefer to use eq_hist which is not compatible with create_ramp_legend). For the color map I used cmap = ['white', Category10_10[0]] (where Category10_10 was imported from bokeh.palettes).

This is a plot of all the data I am considering, 90097 points (generated through a Monte Carlo simulation). bokeh_plot 1 bokeh_plot 2 It seems to not get to the highest values, though maybe the pixel with the most points is an outlier compared to the other pixels.

This is a selection of the points that correspond to our experimental measurement, only 742 points. bokeh_plot 7 bokeh_plot 8

For the second case, I know one specific (x, y) point has 22 counts (not sure if more got bunched in b/c they would be in the same pixel). I cannot reconcile how the legend resulted in what it did.

I am leaning toward just using the bokeh cross glyph, which produces the figures below. My primary motivation for adding this comment to provide some additional input for debugging the legends.

contacts_all_1 contacts_x2_lt_55_1