holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.71k stars 403 forks source link

.select() keeps unnecessary dimensions, and Elements plot them #4564

Open rchurt opened 4 years ago

rchurt commented 4 years ago

ALL software version info

holoviews==1.13.3
bokeh==2.1.1
python==3.6.9

Google Colab Chrome 84.0.4147.89

Description of expected behavior and the observed behavior

Let's say I create and slice a Dataset like so:

import holoviews as hv
import xarray as xr
import numpy as np

hv.extension('bokeh')

data = xr.DataArray(np.random.randn(2, 100, 100, 3),
                    dims=('scan_type', 'x', 'y', 'z'),
                    name='Intensity',
                    coords={'scan_type':['BF', 'fluorescence'],
                            'x':np.arange(100),
                            'y':np.arange(100),
                            'z':np.arange(3)})

ds = hv.Dataset(data)
x_range = slice(20,50)
y_range = slice(50,80)
z_point = 1

sliced_ds = ds.select(x=x_range, y=y_range, z=z_point)

The first problem is that the .select() keeps unnecessary dimensions. For example, compare the output of this...

sliced_ds.data.sizes

...to slicing the original xarray...

sliced_xarray = data.loc[{'x':x_range, 'y':y_range, 'z':z_point}]
sliced_xarray.sizes

...and you'll see that the slice of the original xarray doesn't keep the unnecessary dimension z.


The next problem is that if I go to plot a dataset with an unnecessary dimension...

sliced_ds.to(hv.QuadMesh, kdims=['x', 'y'])

...it gives a dropdown to select the unnecessary dimension...

Screen Shot 2020-08-20 at 4 05 19 PM

...when what I really want is this:

sliced_xarray_ds = hv.Dataset(sliced_xarray)
sliced_xarray_ds.to(hv.QuadMesh, kdims=['x', 'y'])
Screen Shot 2020-08-20 at 4 09 59 PM

In this example it's not much of a problem to have the extra dimension there, but it was a problem for me when I then tried to use that dimension later (i.e., to iterate through it to make a GridSpace) because each plot in the GridSpace still has the dimension, but different values for it.

Fixing either of these issues (i.e., the extra dimensions being kept by .select() and being plotted) would fix the problem, but perhaps it would be worth fixing both?

philippjfr commented 4 years ago

Thanks for the thoughtful writeup.

The next problem is that if I go to plot a dataset with an unnecessary dimension...

This is definitely a bug (or rather a regression), you should not get a widget for a constant dimension.

The larger issue of dropping dimensions on select I also agree with and something that has bothered me before. Given the backward compatibility constraints here I would say we should start by implementing a drop keyword or similar on select and then consider making it the default for 2.0.

rchurt commented 4 years ago

Agreed, sounds like a good plan!