holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.26k stars 363 forks source link

Duplicating the legend example with other data does not seem to work #276

Open StevenCHowell opened 7 years ago

StevenCHowell commented 7 years ago

I am trying to adapt the code from the legend example notebook to another data set. I replaced the data with the 5 Gaussian distributions, updating the appropriate inputs but the legend is entirely black.

Here is the code I ran (in a jupyter notebook):

import pandas as pd
import numpy as np

from bokeh.io import output_notebook, show
from bokeh.plotting import Figure
output_notebook()

import datashader as ds
import datashader.transfer_functions as tf

from datashader.colors import Hot
from datashader.bokeh_ext import create_ramp_legend, create_categorical_legend

# create sample dataset
np.random.seed(1)
num=1000000

dists = {cat: pd.DataFrame(dict(x=np.random.normal(x,s,num),
                                y=np.random.normal(y,s,num),
                                val=val,cat=cat))
         for x,y,s,val,cat in 
         [(2,2,0.01,10,"d1"), (2,-2,0.1,20,"d2"), (-2,-2,0.5,30,"d3"), (-2,2,1.0,40,"d4"), (0,0,3,50,"d5")]}

df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")
df.tail()  # show some of the data in an interactive setting

def create_base_plot():

    # taxi data is in meters
    xmin = df.x.min()
    ymin = df.y.min()
    xmax = df.x.max()
    ymax = df.y.max()

    cvs = ds.Canvas(plot_width=900,
                    plot_height=600,
                    x_range=(xmin, xmax),
                    y_range=(ymin, ymax))

    agg = cvs.points(df, 'x', 'y')
    img = tf.shade(agg, cmap=Hot, how='eq_hist')
    fig = Figure(x_range=(xmin, xmax),
                 y_range=(ymin, ymax),
                 plot_width=600,
                 plot_height=600,
                 tools='')

    fig.background_fill_color = 'black'
    fig.toolbar_location = None
    fig.axis.visible = False
    fig.grid.grid_line_alpha = 0
    fig.min_border_left = 0
    fig.min_border_right = 0
    fig.min_border_top = 0
    fig.min_border_bottom = 0

    fig.image_rgba(image=[img.data],
                   x=[xmin],
                   y=[ymin],
                   dw=[xmax-xmin],
                   dh=[ymax-ymin])
    return fig, (xmin, ymin, xmax, ymax), agg

fig, extent, datashader_agg = create_base_plot()
show(fig)

legend_fig = create_ramp_legend(datashader_agg,
                                Hot,
                                how='eq_hist',
                                width=600)
show(legend_fig)

Here is the result: legend_fail

I noticed the range for my aggregation is much larger than the taxi example, [0, 728852] compared to [0, 1968].

>>> datashader_agg.min()
<xarray.DataArray ()>
array(0, dtype=int32)
>>> datashader_agg.max()
<xarray.DataArray ()>
array(728852, dtype=int32)

The increased range should not be responsible for the error but I will look into that.

I am not certain this is a bug or simply an misunderstanding of the example on my part.

StevenCHowell commented 7 years ago

Here is a simpler testing script.

imports and setup:

# imports
import pandas as pd
import numpy as np

import datashader as ds
import datashader.transfer_functions as tf

from datashader.bokeh_ext import create_ramp_legend, create_categorical_legend

import bokeh.plotting
bokeh.plotting.output_notebook()

# create sample dataset
np.random.seed(1)
num=1000000
dists = {cat: pd.DataFrame(dict(x=np.random.normal(x,s,num),
                                y=np.random.normal(y,s,num),
                                val=val,cat=cat))
         for x,y,s,val,cat in 
         [(2,2,0.01,10,"d1"), (2,-2,0.1,20,"d2"), (-2,-2,0.5,30,"d3"), 
          (-2,2,1.0,40,"d4"), (0,0,3,50,"d5")]}
df = pd.concat(dists,ignore_index=True)
df["cat"]=df["cat"].astype("category")
df.tail()  # view data sample in interactive view

actual plotting script:

# generate the plot with a legend
height = 600
width = 600

# palette = ['white', 'navy']
from bokeh.palettes import Viridis256 as palette
# from datashader.colors import Hot as palette

how = 'eq_hist'
# how = 'linear'
# how = 'log'

x_range = [df.x.min(), df.x.max()]
y_range = [df.y.min(), df.y.max()]

cvs = ds.Canvas(plot_width=width, plot_height=height,
                x_range=x_range, y_range=y_range)

agg = cvs.points(df, 'x', 'y')
img = tf.shade(agg, cmap=palette, how=how)
fig = bokeh.plotting.Figure(x_range=x_range, y_range=y_range, 
                            plot_width=width, plot_height=height, 
                            tools='')

fig.image_rgba(image=[img.data], x=x_range[0], y=y_range[0], 
               dw=[x_range[1]-x_range[0]], dh=[y_range[1]-y_range[0]])

bokeh.plotting.show(fig)

legend_fig = create_ramp_legend(agg, palette, how=how, width=width)
bokeh.plotting.show(legend_fig)

sample output demonstrating the problem:

image

jbednar commented 7 years ago

It looks to me like this is mostly a documentation problem; the docstring for create_ramp_legend implies that any 'how' option is supported, but at present the actual code only supports 'linear' and 'log', without ever checking for other options. So it is not currently safe to use anything but those two 'how' options. I have a plan for how to support other options (#126), but meanwhile I've updated master to show that only those two options are allowed (commit https://github.com/bokeh/datashader/commit/6101791e8).