Proposal for improving support for wide data

philippjfr commented 1 month ago

From the beginning HoloViews was designed primarily around tidy data. This has the major benefit that data can clearly be delineated into key dimensions (or independent values / coordinates) and value dimensions, which represent a dependent variable, i.e. some kind of measurement. Additionally it makes it possible to easily perform the groupby operations that allow HoloViews to easily facet data in a grid (GridSpace), layout (NdLayout), using widgets (HoloMap/DynamicMap) and as a set of trace in a plot (NdOverlay). However in many common scenarios data will not be tidy, the most common of which is when you are storing a bunch of timeseries indexed by the date(time) and then store multiple measurements all representing the kind of value, e.g. the most common example is stock prices where the index is the date and each column records the stock price for a different ticker.

The problem with reshaping this data is that it's tremendously inefficient. Where before you could have one DataFrame you now have to create N DataFrames, one for each stock ticker. So here I will lay out my proposal for formally supporting wide data in HoloViews.

The Problem

While today you can already organize data in such a way that you create an NdOverlay where each Element provides a view into one column in the wide DataFrame, it breaks HoloViews' internal model of the world. E.g. let's look at what the structure of the ticker data looks like if you do this:

NdOverlay [ticker]
    Curve [datetime] (AAPL)
    Curve [datetime] (MSFT)
    Curve [datetime] (IBM)

Here the ticker names now become the values of the NdOverlay key dimension AND they are the value dimension names of each Curve elements. This is clearly inelegant and also conceptually not correct, i.e. AAPL is not a dimension, it does not represent some actual measurable quantity with some associated unit. The actual measurable quantity is "Stock Price". The reason this is necessary is because the element equates the value dimension with the name of the variable in the underlying data, i.e. the string 'AAPL' will be used to look up the column in the underlying DataFrame. Downstream this causes issues for the sharing of dimension ranges in plots and other features that rely on the identity of Dimensions.

The proposal

There are a few proposals that might give us a way out of this but they are potentially quite disruptive since HoloViews deeply embeds the assumption that the Dimension.name is the name of the variable in the underlying dataset. Introducing a new distinct variable on the Dimension to distinguish the name of the Dimension and the variable to look up does therefore not seem feasible. The only thing that I believe can be feasibly implemented is relying entirely on the Dimension.label for the identity of the Dimension. In most scenarios the name and label are mirrors of each other anyway but when a user defines label that should be sufficient to uniquely identify the Dimension.

Based on some initial testing this would already almost achieve what we want without breaking anything. Based on a quick survey the changes required to make this work are relatively minor:

Dimension.__eq__ should compare just the label not the name and label ensuring that Dimension('AAPL', label='Price') and Dimension('MSFT', label='Price') are treated as the same dimension.
The Dimension and Dimensioned reprs should be updated to reflect the label as the source of truth of the identity of the dimension.
The plotting code must now index the dimension ranges by label and also look them up by label.
Logic to link Bokeh axes should be updated to consider only the Dimension.label

This would be sufficient to fully support wide data without major disruptive changes to HoloViews, ensuring that linking of dimension ranges continues to work and that the reprs correctly represent the conceptual model HoloViews has of the data.

droumis commented 1 month ago

Amazing write-up.

Here is an additional example with code, which would hopefully be addressed by the proposed changes.

In the code below, I believe that we have to redim the channel_name to the common value dimension (amplitude_dim) in order for downsample1d to work, but I think doing this redim prevents the wide-dataframe-index-optimization, slowing things down as the number of lines scales up.

Code

```python import numpy as np import pandas as pd from scipy.stats import zscore import wget from pathlib import Path import mne import colorcet as cc import holoviews as hv from holoviews.plotting.links import RangeToolLink from holoviews.operation.datashader import rasterize from holoviews.operation.downsample import downsample1d from bokeh.models import HoverTool import panel as pn pn.extension() hv.extension('bokeh') np.random.seed(0) data_url = 'https://physionet.org/files/eegmmidb/1.0.0/S001/S001R04.edf' output_directory = Path('./data') output_directory.mkdir(parents=True, exist_ok=True) data_path = output_directory / Path(data_url).name if not data_path.exists(): data_path = wget.download(data_url, out=str(data_path)) raw = mne.io.read_raw_edf(data_path, preload=True) raw.set_eeg_reference("average") raw.rename_channels(lambda s: s.strip(".")); df = raw.to_data_frame() # TODO: fix rangetool for time_format='datetime' df.set_index('time', inplace=True) df.head() # Viz amplitude_dim = hv.Dimension("amplitude", unit="µV") time_dim = hv.Dimension("time", unit="s") # match the index name in the df curves = {} for channel_name, channel_data in df.items(): curve = hv.Curve(df, kdims=[time_dim], vdims=[channel_name], group="EEG", label=channel_name) # TODO: Without the redim, downsample1d errors. But with, it prevents common index slice optimization. :( curve = curve.redim(**{str(channel_name): amplitude_dim}) curve = curve.opts( subcoordinate_y=True, subcoordinate_scale=2, color="black", line_width=1, tools=["hover"], hover_tooltips=[ ("type", "$group"), ("channel", "$label"), ("time"), # TODO: '@time{%H:%M:%S.%3N}'), ("amplitude"), ], ) curves[channel_name] = curve curves_overlay = hv.Overlay(curves, kdims="channel").opts( ylabel="channel", show_legend=False, padding=0, min_height=500, responsive=True, shared_axes=False, framewise=False, ) curves_overlay = downsample1d(curves_overlay, algorithm='minmax-lttb') # minimap channels = df.columns time = df.index.values y_positions = range(len(channels)) yticks = [(i, ich) for i, ich in enumerate(channels)] z_data = zscore(df, axis=0).T minimap = rasterize(hv.Image((time, y_positions, z_data), ["Time", "Channel"], "amplitude")) minimap = minimap.opts( cmap="RdBu_r", colorbar=False, xlabel='', alpha=0.5, yticks=[yticks[0], yticks[-1]], toolbar='disable', height=120, responsive=True, # default_tools=[], cnorm='eq_hist' ) RangeToolLink(minimap, curves_overlay, axes=["x", "y"], boundsx=(0, time[len(time)//3]) # limit the initial x-range of the minimap ) layout = (curves_overlay + minimap).cols(1) layout ```

output image

Print curve overlay with redim

```python :DynamicMap [] :Overlay .EEG.Fc5 :Curve [time] (amplitude) .EEG.Fc3 :Curve [time] (amplitude) .EEG.Fc1 :Curve [time] (amplitude) .EEG.Fcz :Curve [time] (amplitude) .EEG.Fc2 :Curve [time] (amplitude) .EEG.Fc4 :Curve [time] (amplitude) .EEG.Fc6 :Curve [time] (amplitude) .EEG.C5 :Curve [time] (amplitude) .EEG.C3 :Curve [time] (amplitude) .EEG.C1 :Curve [time] (amplitude) .EEG.Cz :Curve [time] (amplitude) .EEG.C2 :Curve [time] (amplitude) .EEG.C4 :Curve [time] (amplitude) .EEG.C6 :Curve [time] (amplitude) .EEG.Cp5 :Curve [time] (amplitude) .EEG.Cp3 :Curve [time] (amplitude) .EEG.Cp1 :Curve [time] (amplitude) .EEG.Cpz :Curve [time] (amplitude) .EEG.Cp2 :Curve [time] (amplitude) .EEG.Cp4 :Curve [time] (amplitude) .EEG.Cp6 :Curve [time] (amplitude) .EEG.Fp1 :Curve [time] (amplitude) .EEG.Fpz :Curve [time] (amplitude) .EEG.Fp2 :Curve [time] (amplitude) .EEG.Af7 :Curve [time] (amplitude) .EEG.Af3 :Curve [time] (amplitude) .EEG.Afz :Curve [time] (amplitude) .EEG.Af4 :Curve [time] (amplitude) .EEG.Af8 :Curve [time] (amplitude) .EEG.F7 :Curve [time] (amplitude) .EEG.F5 :Curve [time] (amplitude) .EEG.F3 :Curve [time] (amplitude) .EEG.F1 :Curve [time] (amplitude) .EEG.Fz :Curve [time] (amplitude) .EEG.F2 :Curve [time] (amplitude) .EEG.F4 :Curve [time] (amplitude) .EEG.F6 :Curve [time] (amplitude) .EEG.F8 :Curve [time] (amplitude) .EEG.Ft7 :Curve [time] (amplitude) .EEG.Ft8 :Curve [time] (amplitude) .EEG.T7 :Curve [time] (amplitude) .EEG.T8 :Curve [time] (amplitude) .EEG.T9 :Curve [time] (amplitude) .EEG.T10 :Curve [time] (amplitude) .EEG.Tp7 :Curve [time] (amplitude) .EEG.Tp8 :Curve [time] (amplitude) .EEG.P7 :Curve [time] (amplitude) .EEG.P5 :Curve [time] (amplitude) .EEG.P3 :Curve [time] (amplitude) .EEG.P1 :Curve [time] (amplitude) .EEG.Pz :Curve [time] (amplitude) .EEG.P2 :Curve [time] (amplitude) .EEG.P4 :Curve [time] (amplitude) .EEG.P6 :Curve [time] (amplitude) .EEG.P8 :Curve [time] (amplitude) .EEG.Po7 :Curve [time] (amplitude) .EEG.Po3 :Curve [time] (amplitude) .EEG.Poz :Curve [time] (amplitude) .EEG.Po4 :Curve [time] (amplitude) .EEG.Po8 :Curve [time] (amplitude) .EEG.O1 :Curve [time] (amplitude) .EEG.Oz :Curve [time] (amplitude) .EEG.O2 :Curve [time] (amplitude) .EEG.Iz :Curve [time] (amplitude) ```

Print curve overlay without redim

```python :Overlay .Fc5 :Curve [time] (Fc5) .Fc3 :Curve [time] (Fc3) .Fc1 :Curve [time] (Fc1) .Fcz :Curve [time] (Fcz) .Fc2 :Curve [time] (Fc2) .Fc4 :Curve [time] (Fc4) .Fc6 :Curve [time] (Fc6) .C5 :Curve [time] (C5) .C3 :Curve [time] (C3) .C1 :Curve [time] (C1) .Cz :Curve [time] (Cz) .C2 :Curve [time] (C2) .C4 :Curve [time] (C4) .C6 :Curve [time] (C6) .Cp5 :Curve [time] (Cp5) .Cp3 :Curve [time] (Cp3) .Cp1 :Curve [time] (Cp1) .Cpz :Curve [time] (Cpz) .Cp2 :Curve [time] (Cp2) .Cp4 :Curve [time] (Cp4) .Cp6 :Curve [time] (Cp6) .Fp1 :Curve [time] (Fp1) .Fpz :Curve [time] (Fpz) .Fp2 :Curve [time] (Fp2) .Af7 :Curve [time] (Af7) .Af3 :Curve [time] (Af3) .Afz :Curve [time] (Afz) .Af4 :Curve [time] (Af4) .Af8 :Curve [time] (Af8) .F7 :Curve [time] (F7) .F5 :Curve [time] (F5) .F3 :Curve [time] (F3) .F1 :Curve [time] (F1) .Fz :Curve [time] (Fz) .F2 :Curve [time] (F2) .F4 :Curve [time] (F4) .F6 :Curve [time] (F6) .F8 :Curve [time] (F8) .Ft7 :Curve [time] (Ft7) .Ft8 :Curve [time] (Ft8) .T7 :Curve [time] (T7) .T8 :Curve [time] (T8) .T9 :Curve [time] (T9) .T10 :Curve [time] (T10) .Tp7 :Curve [time] (Tp7) .Tp8 :Curve [time] (Tp8) .P7 :Curve [time] (P7) .P5 :Curve [time] (P5) .P3 :Curve [time] (P3) .P1 :Curve [time] (P1) .Pz :Curve [time] (Pz) .P2 :Curve [time] (P2) .P4 :Curve [time] (P4) .P6 :Curve [time] (P6) .P8 :Curve [time] (P8) .Po7 :Curve [time] (Po7) .Po3 :Curve [time] (Po3) .Poz :Curve [time] (Poz) .Po4 :Curve [time] (Po4) .Po8 :Curve [time] (Po8) .O1 :Curve [time] (O1) .Oz :Curve [time] (Oz) .O2 :Curve [time] (O2) .Iz :Curve [time] (Iz) ```

jbednar commented 1 month ago

I'm happy with your analysis of the issue and I think I'm happy with your proposed solution.

If I'm following along correctly, it seems like users could have problems of the same sort as why multi_y is not the default due to people previously having been sloppy about declaring dimensions in Overlays. I.e. people may have only declared a label for one Element out of an overlay, since that's all that's needed to get the axis label to update, and now only that one plot will match dimension for things like shared_axes and link_selections. Not handling sloppy code like that isn't a fatal issue with the approach, but it would be good to work out exactly when and if it would occur so that we can guide users.

In any case, would we then build on this support to add something at the hv.Dataset level where we can easily do a groupby or by on the wide dataframe and get this behavior, without explicitly having to construct an overlay or layout?

droumis commented 1 month ago

In any case, would we then build on this support to add something at the hv.Dataset level where we can easily do a groupby or by on the wide dataframe and get this behavior, without explicitly having to construct an overlay or layout?

Maybe we could avoid the explicit overlay construction at the hvPlot level.

I personally don't think the added brevity is a top priority in a HoloViews-dominant workflow.

philippjfr commented 1 month ago

In any case, would we then build on this support to add something at the hv.Dataset level where we can easily do a groupby or by on the wide dataframe and get this behavior, without explicitly having to construct an overlay or layout?

Agree with @droumis, I'm honestly fine with leaving that to hvPlot but also wouldn't be opposed if someone wanted to propose such an API for Dataset.

jbednar commented 1 month ago

Doing that at the hvPlot level makes good sense, yes.

I'm not worried about brevity so much as consistency, i.e. to ensure that there is a clean, well-supported, well-documented, tested way to work easily with a wide dataframe. Giving whatever way that is a name is one way to ensure that, but it can be done with documentation and examples instead if the code is clean. Would be good to see an example here of the HoloViews code that would be used to create a plot of one timeseries from such a dataframe at a time with a selector widget to select the stock name, in the absence of new API.

philippjfr commented 4 weeks ago

Here's what that looks like:

df = pd.read_csv('https://datasets.holoviz.org/stocks/v1/stocks.csv', parse_dates=['Date']).set_index('Date')

hv.NdOverlay({col: hv.Curve(df, 'Date', (col, 'Price')) for col in df.columns}, 'Ticker')

droumis commented 3 weeks ago

@philippjfr, how would I now adapt this code mentioned above? I'm seeing errors:

with the redim

```python Traceback (most recent call last): File "/Users/droumis/src/holoviews/holoviews/plotting/bokeh/element.py", line 2047, in _init_glyphs renderer, glyph = self._init_glyph(plot, mapping, properties) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/droumis/src/holoviews/holoviews/plotting/bokeh/element.py", line 1726, in _init_glyph center = y_source_range.tags[1]['subcoordinate_y'] ~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^ KeyError: 'subcoordinate_y' ```

without the redim:

```python WARNING:param.dynamic_operation: Callable raised "ValueError('y array must be contiguous.')". Invoked as dynamic_operation(height=400, scale=1.0, width=400, x_range=None) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/IPython/core/formatters.py:974, in MimeBundleFormatter.__call__(self, obj, include, exclude) 971 method = get_real_method(obj, self.print_method) 973 if method is not None: --> 974 return method(include=include, exclude=exclude) 975 return None 976 else: File ~/src/holoviews/holoviews/core/dimension.py:1275, in Dimensioned._repr_mimebundle_(self, include, exclude) 1268 def _repr_mimebundle_(self, include=None, exclude=None): 1269 """ 1270 Resolves the class hierarchy for the class rendering the 1271 object using any display hooks registered on Store.display 1272 hooks. The output of all registered display_hooks is then 1273 combined and returned. 1274 """ -> 1275 return Store.render(self) File ~/src/holoviews/holoviews/core/options.py:1428, in Store.render(cls, obj) 1426 data, metadata = {}, {} 1427 for hook in hooks: -> 1428 ret = hook(obj) 1429 if ret is None: 1430 continue File ~/src/holoviews/holoviews/ipython/display_hooks.py:287, in pprint_display(obj) 285 if not ip.display_formatter.formatters['text/plain'].pprint: 286 return None --> 287 return display(obj, raw_output=True) File ~/src/holoviews/holoviews/ipython/display_hooks.py:258, in display(obj, raw_output, **kwargs) 256 elif isinstance(obj, (Layout, NdLayout, AdjointLayout)): 257 with option_state(obj): --> 258 output = layout_display(obj) 259 elif isinstance(obj, (HoloMap, DynamicMap)): 260 with option_state(obj): File ~/src/holoviews/holoviews/ipython/display_hooks.py:149, in display_hook..wrapped(element) 147 try: 148 max_frames = OutputSettings.options['max_frames'] --> 149 mimebundle = fn(element, max_frames=max_frames) 150 if mimebundle is None: 151 return {}, {} File ~/src/holoviews/holoviews/ipython/display_hooks.py:223, in layout_display(layout, max_frames) 220 max_frame_warning(max_frames) 221 return None --> 223 return render(layout) File ~/src/holoviews/holoviews/ipython/display_hooks.py:76, in render(obj, **kwargs) 73 if renderer.fig == 'pdf': 74 renderer = renderer.instance(fig='png') ---> 76 return renderer.components(obj, **kwargs) File ~/src/holoviews/holoviews/plotting/renderer.py:396, in Renderer.components(self, obj, fmt, comm, **kwargs) 394 embed = (not (dynamic or streams or self.widget_mode == 'live') or config.embed) 395 if embed or config.comms == 'default': --> 396 return self._render_panel(plot, embed, comm) 397 return self._render_ipywidget(plot) File ~/src/holoviews/holoviews/plotting/renderer.py:403, in Renderer._render_panel(self, plot, embed, comm) 401 doc = Document() 402 with config.set(embed=embed): --> 403 model = plot.layout._render_model(doc, comm) 404 if embed: 405 return render_model(model, comm) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/viewable.py:736, in Viewable._render_model(self, doc, comm) 734 if comm is None: 735 comm = state._comm_manager.get_server_comm() --> 736 model = self.get_root(doc, comm) 738 if self._design and self._design.theme.bokeh_theme: 739 doc.theme = self._design.theme.bokeh_theme File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/layout/base.py:320, in Panel.get_root(self, doc, comm, preprocess) 316 def get_root( 317 self, doc: Optional[Document] = None, comm: Optional[Comm] = None, 318 preprocess: bool = True 319 ) -> Model: --> 320 root = super().get_root(doc, comm, preprocess) 321 # ALERT: Find a better way to handle this 322 if hasattr(root, 'styles') and 'overflow-x' in root.styles: File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/viewable.py:667, in Renderable.get_root(self, doc, comm, preprocess) 665 wrapper = self._design._wrapper(self) 666 if wrapper is self: --> 667 root = self._get_model(doc, comm=comm) 668 if preprocess: 669 self._preprocess(root) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/layout/base.py:186, in Panel._get_model(self, doc, root, parent, comm) 184 root = root or model 185 self._models[root.ref['id']] = (model, parent) --> 186 objects, _ = self._get_objects(model, [], doc, root, comm) 187 props = self._get_properties(doc) 188 props[self._property_mapping['objects']] = objects File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/layout/base.py:168, in Panel._get_objects(self, model, old_objects, doc, root, comm) 166 else: 167 try: --> 168 child = pane._get_model(doc, root, model, comm) 169 except RerenderError as e: 170 if e.layout is not None and e.layout is not self: File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/pane/holoviews.py:429, in HoloViews._get_model(self, doc, root, parent, comm) 427 plot = self.object 428 else: --> 429 plot = self._render(doc, comm, root) 431 plot.pane = self 432 backend = plot.renderer.backend File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/panel/pane/holoviews.py:525, in HoloViews._render(self, doc, comm, root) 522 if comm: 523 kwargs['comm'] = comm --> 525 return renderer.get_plot(self.object, **kwargs) File ~/src/holoviews/holoviews/plotting/bokeh/renderer.py:68, in BokehRenderer.get_plot(self_or_cls, obj, doc, renderer, **kwargs) 61 @bothmethod 62 def get_plot(self_or_cls, obj, doc=None, renderer=None, **kwargs): 63 """ 64 Given a HoloViews Viewable return a corresponding plot instance. 65 Allows supplying a document attach the plot to, useful when 66 combining the bokeh model with another plot. 67 """ ---> 68 plot = super().get_plot(obj, doc, renderer, **kwargs) 69 if plot.document is None: 70 plot.document = Document() if self_or_cls.notebook_context else curdoc() File ~/src/holoviews/holoviews/plotting/renderer.py:216, in Renderer.get_plot(self_or_cls, obj, doc, renderer, comm, **kwargs) 213 raise SkipRendering(msg.format(dims=dims)) 215 # Initialize DynamicMaps with first data item --> 216 initialize_dynamic(obj) 218 if not renderer: 219 renderer = self_or_cls File ~/src/holoviews/holoviews/plotting/util.py:270, in initialize_dynamic(obj) 268 continue 269 if not len(dmap): --> 270 dmap[dmap._initial_key()] File ~/src/holoviews/holoviews/core/spaces.py:1216, in DynamicMap.__getitem__(self, key) 1214 # Not a cross product and nothing cached so compute element. 1215 if cache is not None: return cache -> 1216 val = self._execute_callback(*tuple_key) 1217 if data_slice: 1218 val = self._dataslice(val, data_slice) File ~/src/holoviews/holoviews/core/spaces.py:983, in DynamicMap._execute_callback(self, *args) 980 kwargs['_memoization_hash_'] = hash_items 982 with dynamicmap_memoization(self.callback, self.streams): --> 983 retval = self.callback(*args, **kwargs) 984 return self._style(retval) File ~/src/holoviews/holoviews/core/spaces.py:581, in Callable.__call__(self, *args, **kwargs) 578 args, kwargs = (), dict(pos_kwargs, **kwargs) 580 try: --> 581 ret = self.callable(*args, **kwargs) 582 except KeyError: 583 # KeyError is caught separately because it is used to signal 584 # invalid keys on DynamicMap and should not warn 585 raise File ~/src/holoviews/holoviews/util/__init__.py:1039, in Dynamic._dynamic_operation..dynamic_operation(*key, **kwargs) 1037 def dynamic_operation(*key, **kwargs): 1038 key, obj = resolve(key, kwargs) -> 1039 return apply(obj, *key, **kwargs) File ~/src/holoviews/holoviews/util/__init__.py:1031, in Dynamic._dynamic_operation..apply(element, *key, **kwargs) 1029 def apply(element, *key, **kwargs): 1030 kwargs = dict(util.resolve_dependent_kwargs(self.p.kwargs), **kwargs) -> 1031 processed = self._process(element, key, kwargs) 1032 if (self.p.link_dataset and isinstance(element, Dataset) and 1033 isinstance(processed, Dataset) and processed._dataset is None): 1034 processed._dataset = element.dataset File ~/src/holoviews/holoviews/util/__init__.py:1013, in Dynamic._process(self, element, key, kwargs) 1011 elif isinstance(self.p.operation, Operation): 1012 kwargs = {k: v for k, v in kwargs.items() if k in self.p.operation.param} -> 1013 return self.p.operation.process_element(element, key, **kwargs) 1014 else: 1015 return self.p.operation(element, **kwargs) File ~/src/holoviews/holoviews/core/operation.py:194, in Operation.process_element(self, element, key, **params) 191 else: 192 self.p = param.ParamOverrides(self, params, 193 allow_extra_keywords=self._allow_extra_keywords) --> 194 return self._apply(element, key) File ~/src/holoviews/holoviews/core/operation.py:141, in Operation._apply(self, element, key) 139 if not in_method: 140 element._in_method = True --> 141 ret = self._process(element, key) 142 if hasattr(element, '_in_method') and not in_method: 143 element._in_method = in_method File ~/src/holoviews/holoviews/operation/downsample.py:242, in downsample1d._process(self, element, key, shared_data) 240 _process = partial(self._process, **kwargs) 241 if isinstance(element, Overlay): --> 242 elements = [v.map(_process) for v in element] 243 else: 244 elements = {k: v.map(_process) for k, v in element.items()} File ~/src/holoviews/holoviews/core/data/__init__.py:196, in PipelineMeta.pipelined..pipelined_fn(*args, **kwargs) 193 inst._in_method = True 195 try: --> 196 result = method_fn(*args, **kwargs) 197 if PipelineMeta.disable: 198 return result File ~/src/holoviews/holoviews/core/data/__init__.py:1213, in Dataset.map(self, *args, **kwargs) 1211 @wraps(LabelledData.map) 1212 def map(self, *args, **kwargs): -> 1213 return super().map(*args, **kwargs) File ~/src/holoviews/holoviews/core/dimension.py:695, in LabelledData.map(self, map_fn, specs, clone) 693 return deep_mapped 694 else: --> 695 return map_fn(self) if applies else self File ~/src/holoviews/holoviews/operation/downsample.py:270, in downsample1d._process(self, element, key, shared_data) 268 elif self.p.algorithm == "minmax-lttb": 269 kwargs['minmax_ratio'] = self.p.minmax_ratio --> 270 samples = downsample(xs, ys, self.p.width, parallel=self.p.parallel, **kwargs) 271 return element.iloc[samples] File ~/src/holoviews/holoviews/operation/downsample.py:181, in _min_max_lttb(x, y, n_out, **kwargs) 176 except ModuleNotFoundError: 177 raise NotImplementedError( 178 'The minmax-lttb downsampling algorithm requires the tsdownsample ' 179 'library to be installed.' 180 ) from None --> 181 return MinMaxLTTBDownsampler().downsample(x, y, n_out=n_out, **kwargs) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/tsdownsample/downsamplers.py:114, in MinMaxLTTBDownsampler.downsample(self, n_out, minmax_ratio, parallel, *args, **_) 110 def downsample( 111 self, *args, n_out: int, minmax_ratio: int = 4, parallel: bool = False, **_ 112 ): 113 assert minmax_ratio > 0, "minmax_ratio must be greater than 0" --> 114 return super().downsample( 115 *args, n_out=n_out, parallel=parallel, ratio=minmax_ratio 116 ) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/tsdownsample/downsampling_interface.py:376, in AbstractRustDownsampler.downsample(self, n_out, parallel, *args, **kwargs) 368 def downsample(self, *args, n_out: int, parallel: bool = False, **kwargs): 369 """Downsample the data in x and y. 370 371 The x and y arguments are positional-only arguments. If only one argument is (...) 374 considered to be the y-data. 375 """ --> 376 return super().downsample(*args, n_out=n_out, parallel=parallel, **kwargs) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/tsdownsample/downsampling_interface.py:131, in AbstractDownsampler.downsample(self, n_out, *args, **kwargs) 129 x, y = self._check_valid_downsample_args(*args) 130 self._supports_dtype(y, y=True) --> 131 self._check_contiguous(y, y=True) 132 if x is not None: 133 self._supports_dtype(x, y=False) File ~/opt/miniconda3/envs/neuro-multi-chan/lib/python3.12/site-packages/tsdownsample/downsampling_interface.py:38, in AbstractDownsampler._check_contiguous(self, arr, y) 35 if arr.flags["C_CONTIGUOUS"]: 36 return ---> 38 raise ValueError(f"{'y' if y else 'x'} array must be contiguous.") ValueError: y array must be contiguous. ```

Or, not using the redim, but using the mapping as in your stocks example (curve = hv.Curve(df, kdims=[time_dim], vdims=[(channel_name, 'amplitude')])) produces the same error as with the redim.

droumis commented 3 weeks ago

Simplifying to match the stocks example:

philippjfr commented 3 weeks ago

One thing that really confused things, was this:

hv.Overlay(curves, kdims="channel")

Overlays do not support key dimensions so this should be disallowed.

philippjfr commented 3 weeks ago

Across these three PRs this should now be fixed:

holoviz / holoviews

Proposal for improving support for wide data #6260

The Problem

The proposal