holoviz / datashader

Quickly and accurately render even the largest data.
http://datashader.org
BSD 3-Clause "New" or "Revised" License
3.31k stars 366 forks source link

Histograms for big data using datashader #608

Closed rtmlp closed 5 years ago

rtmlp commented 6 years ago

Hi,

I have this idea in my head and tried in vain to search if there is any way to do this. So I am posting the question here.

I am successful in creating datashader plots for different columns of a large dataframe with all the interactive capabilities. Right now, I can plot histogram of the entire dataset for a given column. But, is there any way to have a callback to dynamically create a histogram for the data that is in the current datashader selection or any other way that you can think of to create dynamic histogram ?

Thanks

philippjfr commented 6 years ago

If you're using HoloViews you can easily do this using the .hist method on the rasterized image. Basically you rasterize the Image, then compute a histogram from the Image values, and then you can still apply shading by mapping the shade function onto all images.

import holoviews as hv, numpy as np
from holoviews.operation.datashader import rasterize, shade
hv.extension("bokeh")

points = hv.Points(np.random.randn(10000000, 2))
rasterize(points).hist(dimension='Count').map(shade, hv.Image)

If you're aggregating by a particular column in a dataframe you just declare that:

points = hv.Points(df, ['x', 'y'], vdims='some_column')
rasterize(points, aggregator='mean').hist(dimension='some_column').map(shade, hv.Image)

If you want the histogram as a separate plot:

points = hv.Points(df, ['x', 'y'], vdims='some_column')
img = rasterize(points, aggregator='mean')
shade(img) + img.hist(dimension='some_column', adjoin=False).options(framewise=True)

If you're actually trying to compute a histogram of the raw data rather than the aggregated image, you can do this:

points = hv.Points(df, ['x', 'y'], vdims='some_column')
range_stream = hv.streams.RangeXY()
shaded = datashade(points, streams=[range_stream])
dyn_hist = hv.DynamicMap(lambda x_range, y_range: points.select(x=(x_range), y=(y_range)), streams=[range_stream])
shaded + dyn_hist.hist(adjoin=False, dimension='some_column').options(framewise=True)
jbednar commented 6 years ago

Thanks, @philippjfr. Output from your first code example above, at different zooms:

image image image

rtmlp commented 6 years ago

Hi,

Thank you @philippjfr for the comment and the code. I tried to use the code on a dataframe to see how the histogram changes with the selection of the data on the plot to the left.

I am having trouble with the histogram on the right as it is just showing uniform values when there is at least two modes and also the values just look incorrect as there no observation around y-value of -200.

I used the following code to reproduce the plot

points = hv.Points(df, ['id_', 'col_'])
rasterize(points, aggregator='mean').hist(dimension='col_').map(shade, hv.Image)

Plot on the dataframe

philippjfr commented 6 years ago

Apologies @rtmatx you probably need to enable rescaling of the axis for the histogram like this:

points = hv.Scatter(df, 'id_', 'col_')
rasterized = rasterize(points, aggregator='mean')
rasterized.hist(dimension='col_').map(shade, hv.Image).options('Histogram', framewise=True)
rtmlp commented 6 years ago

Thanks @philippjfr. I tried with the above code and got an error. I have pasted the error stack here. Looks like few ipython functions are not able to process it.

TypeError                                 Traceback (most recent call last)
<ipython-input-71-13d33c392405> in <module>()
      9 points = hv.Scatter(df, 'id_', 'col_')
     10 rasterized = rasterize(points, aggregator='mean')
---> 11 rasterized.hist(dimension='col_').map(shade, hv.Image).options('Histogram', framewise=True)

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/displayhook.py in __call__(self, result)
    255             self.start_displayhook()
    256             self.write_output_prompt()
--> 257             format_dict, md_dict = self.compute_format_data(result)
    258             self.update_user_ns(result)
    259             self.fill_exec_result(result)

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/displayhook.py in compute_format_data(self, result)
    149 
    150         """
--> 151         return self.shell.display_formatter.format(result)
    152 
    153     # This can be set to True by the write_output_prompt method in a subclass

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/formatters.py in format(self, obj, include, exclude)
    148             return {}, {}
    149 
--> 150         format_dict, md_dict = self.mimebundle_formatter(obj, include=include, exclude=exclude)
    151 
    152         if format_dict or md_dict:

TypeError: 'NoneType' object is not iterable

I tried to check the values of some of the variables in the stack. The obj , result values in the error stack is

obj :AdjointLayout
   :DynamicMap   []
   :DynamicMap   []
philippjfr commented 6 years ago

Is there any more to that traceback? It would also be helpful to know which versions of holoviews and IPython you have.

rtmlp commented 6 years ago

Holoviews is 1.10.2 and ipython version is 6.2.1. Sorry there is lot of traceback that I didn't include.

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj, include, exclude)
    968 
    969             if method is not None:
--> 970                 return method(include=include, exclude=exclude)
    971             return None
    972         else:

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/dimension.py in _repr_mimebundle_(self, include, exclude)
   1229         combined and returned.
   1230         """
-> 1231         return Store.render(self)
   1232 
   1233 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/options.py in render(cls, obj)
   1287         data, metadata = {}, {}
   1288         for hook in hooks:
-> 1289             ret = hook(obj)
   1290             if ret is None:
   1291                 continue

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/ipython/display_hooks.py in pprint_display(obj)
    278     if not ip.display_formatter.formatters['text/plain'].pprint:
    279         return None
--> 280     return display(obj, raw_output=True)
    281 
    282 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/ipython/display_hooks.py in display(obj, raw_output, **kwargs)
    251     elif isinstance(obj, (Layout, NdLayout, AdjointLayout)):
    252         with option_state(obj):
--> 253             output = layout_display(obj)
    254     elif isinstance(obj, (HoloMap, DynamicMap)):
    255         with option_state(obj):

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/ipython/display_hooks.py in wrapped(element)
    140         try:
    141             max_frames = OutputSettings.options['max_frames']
--> 142             mimebundle = fn(element, max_frames=max_frames)
    143             if mimebundle is None:
    144                 return {}, {}

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/ipython/display_hooks.py in layout_display(layout, max_frames)
    221         return None
    222 
--> 223     return render(layout)
    224 
    225 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/ipython/display_hooks.py in render(obj, **kwargs)
     63         renderer = renderer.instance(fig='png')
     64 
---> 65     return renderer.components(obj, **kwargs)
     66 
     67 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/plotting/bokeh/renderer.py in components(self, obj, fmt, comm, **kwargs)
    257         # Bokeh has to handle comms directly in <0.12.15
    258         comm = False if bokeh_version < '0.12.15' else comm
--> 259         return super(BokehRenderer, self).components(obj,fmt, comm, **kwargs)
    260 
    261 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/plotting/renderer.py in components(self, obj, fmt, comm, **kwargs)
    319             plot = obj
    320         else:
--> 321             plot, fmt = self._validate(obj, fmt)
    322 
    323         widget_id = None

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/plotting/renderer.py in _validate(self, obj, fmt, **kwargs)
    218         if isinstance(obj, tuple(self.widgets.values())):
    219             return obj, 'html'
--> 220         plot = self.get_plot(obj, renderer=self, **kwargs)
    221 
    222         fig_formats = self.mode_formats['fig'][self.mode]

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/plotting/bokeh/renderer.py in get_plot(self_or_cls, obj, doc, renderer)
    150             doc = Document() if self_or_cls.notebook_context else curdoc()
    151         doc.theme = self_or_cls.theme
--> 152         plot = super(BokehRenderer, self_or_cls).get_plot(obj, renderer)
    153         plot.document = doc
    154         return plot

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/plotting/renderer.py in get_plot(self_or_cls, obj, renderer)
    185         """
    186         # Initialize DynamicMaps with first data item
--> 187         initialize_dynamic(obj)
    188 
    189         if not isinstance(obj, Plot):

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/plotting/util.py in initialize_dynamic(obj)
    242             continue
    243         if not len(dmap):
--> 244             dmap[dmap._initial_key()]
    245 
    246 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in __getitem__(self, key)
   1118         # Not a cross product and nothing cached so compute element.
   1119         if cache is not None: return cache
-> 1120         val = self._execute_callback(*tuple_key)
   1121         if data_slice:
   1122             val = self._dataslice(val, data_slice)

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in _execute_callback(self, *args)
    904 
    905         with dynamicmap_memoization(self.callback, self.streams):
--> 906             retval = self.callback(*args, **kwargs)
    907         return self._style(retval)
    908 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in __call__(self, *args, **kwargs)
    570 
    571         try:
--> 572             ret = self.callable(*args, **kwargs)
    573         except KeyError:
    574             # KeyError is caught separately because it is used to signal

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/util/__init__.py in dynamic_operation(*key, **kwargs)
    435             def dynamic_operation(*key, **kwargs):
    436                 self.p.kwargs.update(kwargs)
--> 437                 return self._process(map_obj[key], key)
    438         if isinstance(self.p.operation, Operation):
    439             return OperationCallable(dynamic_operation, inputs=[map_obj],

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in __getitem__(self, key)
   1118         # Not a cross product and nothing cached so compute element.
   1119         if cache is not None: return cache
-> 1120         val = self._execute_callback(*tuple_key)
   1121         if data_slice:
   1122             val = self._dataslice(val, data_slice)

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in _execute_callback(self, *args)
    904 
    905         with dynamicmap_memoization(self.callback, self.streams):
--> 906             retval = self.callback(*args, **kwargs)
    907         return self._style(retval)
    908 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in __call__(self, *args, **kwargs)
    570 
    571         try:
--> 572             ret = self.callable(*args, **kwargs)
    573         except KeyError:
    574             # KeyError is caught separately because it is used to signal

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/util/__init__.py in dynamic_operation(*key, **kwargs)
    435             def dynamic_operation(*key, **kwargs):
    436                 self.p.kwargs.update(kwargs)
--> 437                 return self._process(map_obj[key], key)
    438         if isinstance(self.p.operation, Operation):
    439             return OperationCallable(dynamic_operation, inputs=[map_obj],

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in __getitem__(self, key)
   1118         # Not a cross product and nothing cached so compute element.
   1119         if cache is not None: return cache
-> 1120         val = self._execute_callback(*tuple_key)
   1121         if data_slice:
   1122             val = self._dataslice(val, data_slice)

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in _execute_callback(self, *args)
    904 
    905         with dynamicmap_memoization(self.callback, self.streams):
--> 906             retval = self.callback(*args, **kwargs)
    907         return self._style(retval)
    908 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/spaces.py in __call__(self, *args, **kwargs)
    570 
    571         try:
--> 572             ret = self.callable(*args, **kwargs)
    573         except KeyError:
    574             # KeyError is caught separately because it is used to signal

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/util/__init__.py in dynamic_operation(*key, **kwargs)
    431                 self.p.kwargs.update(kwargs)
    432                 obj = map_obj[key] if isinstance(map_obj, HoloMap) else map_obj
--> 433                 return self._process(obj, key)
    434         else:
    435             def dynamic_operation(*key, **kwargs):

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/util/__init__.py in _process(self, element, key)
    417             kwargs = {k: v for k, v in self.p.kwargs.items()
    418                       if k in self.p.operation.params()}
--> 419             return self.p.operation.process_element(element, key, **kwargs)
    420         else:
    421             return self.p.operation(element, **self.p.kwargs)

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/operation.py in process_element(self, element, key, **params)
    141         """
    142         self.p = param.ParamOverrides(self, params)
--> 143         return self._apply(element, key)
    144 
    145 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/operation.py in _apply(self, element, key)
    119         for hook in self._preprocess_hooks:
    120             kwargs.update(hook(self, element))
--> 121         ret = self._process(element, key)
    122         for hook in self._postprocess_hooks:
    123             ret = hook(self, ret, **kwargs)

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/operation/datashader.py in _process(self, element, key)
    742             op = transform.instance(**op_params)
    743             op._precomputed = self._precomputed
--> 744             element = element.map(op, predicate)
    745             self._precomputed = op._precomputed
    746         return element

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/dimension.py in map(self, map_fn, specs, clone)
    692             return deep_mapped
    693         else:
--> 694             return map_fn(self) if applies else self
    695 
    696 

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/operation.py in __call__(self, element, **params)
    161                                 operation=self, kwargs=params)
    162         elif isinstance(element, ViewableElement):
--> 163             processed = self._apply(element)
    164         elif isinstance(element, DynamicMap):
    165             if any((not d.values) for d in element.kdims):

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/operation.py in _apply(self, element, key)
    119         for hook in self._preprocess_hooks:
    120             kwargs.update(hook(self, element))
--> 121         ret = self._process(element, key)
    122         for hook in self._postprocess_hooks:
    123             ret = hook(self, ret, **kwargs)

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/operation/datashader.py in _process(self, element, key)
    460             # Replacing x and y coordinates to avoid numerical precision issues
    461             eldata = agg if ds_version > '0.5.0' else (xs, ys, agg.data)
--> 462             return self.p.element_type(eldata, **params)
    463         else:
    464             layers = {}

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/element/raster.py in __init__(self, data, kdims, vdims, bounds, extents, xdensity, ydensity, rtol, **params)
    254             params['rtol'] = config.image_rtol
    255 
--> 256         Dataset.__init__(self, data, kdims=kdims, vdims=vdims, extents=extents, **params)
    257         if not self.interface.gridded:
    258             raise DataError("%s type expects gridded data, %s is columnar."

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/data/__init__.py in __init__(self, data, kdims, vdims, **kwargs)
    195         validate_vdims = kwargs.pop('_validate_vdims', True)
    196         initialized = Interface.initialize(type(self), data, kdims, vdims,
--> 197                                            datatype=kwargs.get('datatype'))
    198         (data, self.interface, dims, extra_kws) = initialized
    199         super(Dataset, self).__init__(data, **dict(kwargs, **dict(dims, **extra_kws)))

~/anaconda/envs/m/lib/python3.6/site-packages/holoviews/core/data/interface.py in initialize(cls, eltype, data, kdims, vdims, datatype)
    209                                   % (intfc.__name__, e))
    210                 error = ' '.join([error, priority_error])
--> 211             raise DataError(error)
    212 
    213         return data, interface, dims, extra_kws

DataError: None of the available storage backends were able to support the supplied data format. XArrayInterface raised following error:

 cannot create a Dataset from a DataArray with the same name as one of its coordinates

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-71-13d33c392405> in <module>()
      9 points = hv.Scatter(temp, 'id_', 'col_')
     10 rasterized = rasterize(points, aggregator='mean')
---> 11 rasterized.hist(dimension='col_').map(shade, hv.Image).options('Histogram', framewise=True)

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/displayhook.py in __call__(self, result)
    255             self.start_displayhook()
    256             self.write_output_prompt()
--> 257             format_dict, md_dict = self.compute_format_data(result)
    258             self.update_user_ns(result)
    259             self.fill_exec_result(result)

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/displayhook.py in compute_format_data(self, result)
    149 
    150         """
--> 151         return self.shell.display_formatter.format(result)
    152 
    153     # This can be set to True by the write_output_prompt method in a subclass

~/anaconda/envs/m/lib/python3.6/site-packages/IPython/core/formatters.py in format(self, obj, include, exclude)
    148             return {}, {}
    149 
--> 150         format_dict, md_dict = self.mimebundle_formatter(obj, include=include, exclude=exclude)
    151 
    152         if format_dict or md_dict:

TypeError: 'NoneType' object is not iterable
philippjfr commented 6 years ago

Could you try updating to holoviews 1.10.5? I can't reproduce the issue.

rtmlp commented 6 years ago

I updated holoviews to 1.10.5 but I got the same error stack. I was wondering if there is anything wrong in the dataset as I was looking at this error

DataError: None of the available storage backends were able to support the supplied data format. XArrayInterface raised following error:

 cannot create a Dataset from a DataArray with the same name as one of its coordinates
rtmlp commented 5 years ago

I was able to create the histograms using the raw data. I noticed that dyn_hist doesn't actually calculate the frequencies of the selected but just scales the earlier histogram plot. Is that supposed to be like that

rtmlp commented 5 years ago

I couldnt reproduce the issue with recent update and it works with the mentioned code. Thanks @philippjfr