holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.32k stars 365 forks source link

How to handle discrete data and scale factor #302

Open HenkHcks opened 7 years ago

HenkHcks commented 7 years ago

Hi all,

Datashader is great to find patterns in my datasets, however I'm currently working with a dataset that has a low resolution in one axis. According to one of the examples of Datashader, one could use the width_scale factor to reduce this effect. However when I tune the scale factor, to my liking, and then zoom in, the gaps between my data increase, making it almost impossible to see any patterns. What is the correct way to handle discrete data?

This should be a self contained example which shows the effect.

import pandas as pd
import numpy as np
from bokeh.plotting import figure, output_notebook, show
import datashader as ds

output_notebook()

p = figure(
        x_range=(-5,5),
        y_range=(-5,5),
        tools='pan,wheel_zoom,box_zoom,reset', 
        plot_width=800, 
        plot_height=500,
    )

n = 10000
df = pd.DataFrame({'x':np.random.normal(size=n),'y':np.random.normal(size=n)})
df.x = df.x.round()
pipeline = ds.Pipeline(df, ds.glyphs.Point("x", "y"), width_scale=0.04)
InteractiveImage(p, pipeline)
jbednar commented 7 years ago

The good news and the bad news is that you have asked an excellent question. In that example, I showed how to use width_scale to force an axis with only a few valid values to stretch across the visible area in a useful way:

image

I was happy with the result, but it required tuning the width scale and the ranges precisely, and indeed, that tuning is only appropriate for a certain fixed zoom level. I don't often have data with a discretized axis like this, and if I did I would need to come up with a better solution than having to tune a magic parameter like that. It sounds like you do have such data, so I hereby nominate you to come up with a good solution for a case like this. :-) I'm happy to discuss possibilities and to help polish and merge any resulting code, but I think there would need to be some specific new code added to make this situation work well in general.

Note that if all you care about is zooming, you could try zooming only on the y axis (in Bokeh, enable Wheel Zoom, then hover over the y axis and adjust the scroll wheel). That should keep the x resolution fixed while letting you see more detail in the y direction. But that won't help avoid the magic number in the first place...

HenkHcks commented 7 years ago

First thanks for your answer. I'll try, but a disclaimer is needed, I've never contributed like this. I would also like to consider the more general case of discrete steps, where the steps are non-uniform. This comes in handy when you want to plot values derived from discrete data, for example when you calculate the angle between x & y.

I see some possibility in adding or changing the aggregating Point class. Just like the Line class we can add multiple points, around our measurement, indicating the inaccuracies. Thinking aloud:

How does this sound?

jbednar commented 7 years ago

Sorry for the delay -- I started to reply to this last week, but never submitted it!

If you have discrete, non-uniform steps, you could conceivably map them into a uniform sequence and then use a Bokeh custom tick formatter to show the original non-uniform values. That's just one approach, and not necessarily very convenient, though.

There's already an issue open about indicating uncertainty, and we'd be happy to move in that direction if it can be specified clearly.

Adding rectangular-shaped pixels is also something that is likely to be useful in general, as well, e.g. to someday eliminate the (sometimes problematic) dependency on rasterio.

I'm not sure what you mean about weighing the pixels.