holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.71k stars 404 forks source link

Improved xarray support #2319

Closed drs251 closed 4 years ago

drs251 commented 6 years ago

I noticed some points concerning xarray support in holoviews which could be improved:

EDIT:

philippjfr commented 6 years ago

I think it would be pretty straightforward to add this feature in the holoviews importing and exporting code, I could look into it.

Agreed this should be pretty straightforward, this is where you'd implement that.

When importing a netcdf file via xarray and passing it to hv.Dataset, the key dimensions are sometimes ordered differently, depending whether the file is imported as a xr.DataArrayor xr.Dataset.

Agreed, this should be consistent and should be handled in the same place as the previous issue.

I've never had any luck converting the data storage backend from 'xarray' to anything else, writing img.clone(datatype=['image']),

Not all conversions are currently supported, generally we only support converting the standard formats, i.e. dictionary based representations, but full ability to convert between all formats would be a good goal to shoot for.

philippjfr commented 6 years ago

I've never had any luck converting the data storage backend from 'xarray' to anything else

Actually a bit of code in Dataset.clone could probably detect if the current datatype is in the list of datatypes that the user supplied and if not just drop down to a generic format. That will take care of almost all conversions I think.

drs251 commented 6 years ago
import holoviews as hv
import xarray as xr
import numpy as np
hv.notebook_extension()
%opts Image [colorbar=True]
xs = np.linspace(0, 2.5, 51)
ys = np.linspace(0, 1, 21)
zs = np.random.random((21, 51))
arr = xr.DataArray(zs, coords=[("y", ys), ("x", xs)], name="field")
x_dim = hv.Dimension("x", unit="m")
y_dim = hv.Dimension("y", unit="m")
z_dim = hv.Dimension("field", unit="mT")
dataset = hv.Dataset(arr, kdims=[x_dim, y_dim], vdims=[z_dim])
dataset.to(hv.Image)

The unit mT does not show up next to the colorbar.

More generally it would be nice to have improved support for changing dimension names, labels and units, not just for xarray related stuff - the .redim utility seems to be unable to change existing labels and units, which IMO is frustrating sometimes.

philippjfr commented 6 years ago

It seems like your latest changes to xarray overwrite existing units on the dimensions.

the .redim utility seems to be unable to change existing labels and units, which IMO is frustrating sometimes.

Concrete examples would be appreciated, I'm not currently aware of any issues with .redim atm.

drs251 commented 6 years ago

It seems like your latest changes to xarray overwrite existing units on the dimensions.

That's a point I did not think about previously (I'm still using 1.9.5, not sure if it's a good idea to use the current master for everyday work). Ideally, specified key and value dimensions should override anything thats present in the xarray, but I haven't checked if that's the case - I'll look into it, perhaps another PR is needed.

Concrete examples would be appreciated, I'm not currently aware of any issues with .redim atm.

xdim = hv.Dimension("x", label="my x dimension", unit="mJ")
ydim = hv.Dimension("y", label="my y dimension", unit="kB")
zdim = hv.Dimension("z", label="my z dimension", unit="fs")
image = hv.Dataset(([0, 1], [1, 2], [[3, 4], [5, 6]]), kdims=[xdim, ydim], vdims=[zdim]).to(hv.Image)
image = image.redim(z='time')
image = image.redim.unit(x='eV', y='TB')
image = image.redim.label(x='Energy', y='Capacity')

Actually, I saw that my only issue with redim is that it refuses to change existing labels - everything else works like I would expect it to. So as far as I know, if you come across a dimension label you don't like, the only option is to extract the dimension values and rebuild the element from scratch. I would prefer if redim worked the same way on labels as it does on units and dimension names.

philippjfr commented 6 years ago

Tbh I agree, @jlstevens you originally objected to being able to override the label, but imo this is an annoying and artificial restriction. What was there some convincing reasoning?

jlstevens commented 6 years ago

The name and label define the identity of a dimension (e.g used to compare dimensions). As such the name and label should explicitly be declared correctly.

philippjfr commented 6 years ago

And using .redim isn't explicit enough? You can already override the dimension as a whole using redim so I really don't see what is gained by disallowing .redim.label except for making things inconsistent and more verbose/difficult for the user.

jlstevens commented 6 years ago

Explicitly declared.

drs251 commented 6 years ago

From my perspective, the label is what gets shown in the plot, and name is what you'd use to refer to the dimension programmatically, e.g. when using .reduce. I could understand if name were immutable and you could change around the label as you like, but I don't get why it's the other way around currently.

jlstevens commented 6 years ago

I actually agree with that @drs251 . Changing the dimension name was always a misfeature imho and I think it was introduced before label existed. Unfortunately, disabling the ability to change the name would now break backwards compatibility.

philippjfr commented 6 years ago

Explicitly declared.

My point was that you can already override both name and label using .redim so I really don't see why changing the label should not be supported as well.

jlstevens commented 6 years ago

My point was that you can already override both name and label using .redim so I really don't see why changing the label should not be supported as well.

I suppose so. The intended design is already broken.

philippjfr commented 6 years ago

Changing the dimension name was always a misfeature imho and I think it was introduced before label existed.

It's absolutely not a misfeature as it's crucial for certain operations, e.g. if we ever want to implement a melt operation and it's already used for similar operations in places.

I suppose so. The intended design is already broken.

How can you say that when completely replacing a dimension is an essential operation that absolutely must be supported.

jlstevens commented 6 years ago

How can you say that when completely replacing a dimension is an essential operation that absolutely must be supported.

For operations perhaps, but for changing the title on a plot? All it does is make things more difficult to reason about. Do you really think the semantics of a dimension change because you want a different title?

philippjfr commented 6 years ago

For operations perhaps, but for changing the title on a plot?

For changing the title of a plot axis you indeed shouldn't change the whole dimension, especially the name, which is precisely why disallowing changing the label easily is such a pain. Instead of simply using .redim.label to set a new label a user will either have to replace the entire dimension or simply rename it, which is the worst possible option but also happens to be the most convenient. A user who simply wants to assign a new axis label gets an error when (s)he tries .redim.label(x='My custom label') and simply ends up using .redim(x='My custom label') because it works, but is actually infinitely worse from your perspective, because it replaces the entire dimension instead of just the label.

For operations perhaps

But we don't have distinct mechanisms to do this inside an operation, so are you suggesting introducing a whole other API? There's many reasons to change the names of dimensions so a user should be able to do it somehow that doesn't require knowledge about the type of data, redim has been the API to do that for a long time.

jlstevens commented 6 years ago

Really, the point is that the title should be set as plot option, not by changing the dimension definition. What the dimension is has nothing to do with the title you want. Does your data fundamentally change in semantics because you want to tweak a title?

A user who simply wants to assign a new axis label gets an error when (s)he tries .redim.label(x='My custom label') and simply ends up using .redim(x='My custom label') because it works, but is actually infinitely worse from your perspective.

This I do agree with at least (as I mentioned above) which is a reason to allow the label to be changed. The entire idea of changing the core identity of a dimension to change a title is the real problem here.

drs251 commented 6 years ago
philippjfr commented 6 years ago

I also just noticed that the .to interface seems to be broken for datasets generated from xarrays on current master, not sure if it's related to my recent changes (but a test is definitely needed for this).

Could you please provide an example? There are extensive unit tests for .to and .groupby which are all passing. If it is failing that is a critical bug and needs to be fixed before a release tomorrow.

drs251 commented 6 years ago
import holoviews as hv
import xarray as xr
import numpy as np

xs = [0.1, 0.2, 0.3]
ys = [0, 1]
zs = np.array([[0, 1], [2, 3], [4, 5]])
da = xr.DataArray(zs, coords=[('x_dim', xs), ('y_dim', ys)], name="data_name", dims=['y_dim', 'x_dim'])
da.attrs['long_name'] = "data long name"
da.attrs['units'] = "array_unit"
da.x_dim.attrs['units'] = "x_unit"
da.y_dim.attrs['long_name'] = "y axis long name"
dataset = hv.Dataset(da)
image = dataset.to(hv.Image)

I get

Traceback (most recent call last):
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 779, in _construct_dataarray
    variable = self._variables[name]
KeyError: 'x'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/danielstephan/python/holoviews/test.py", line 14, in <module>
    image = dataset.to(hv.Image)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 143, in __call__
    element = new_type(selected, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/element/raster.py", line 256, in __init__
    Dataset.__init__(self, data, kdims=kdims, vdims=vdims, extents=extents, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 199, in __init__
    super(Dataset, self).__init__(data, **dict(kwargs, **dict(dims, **extra_kws)))
  File "/Users/danielstephan/python/holoviews/holoviews/element/raster.py", line 50, in __init__
    super(Raster, self).__init__(data, kdims=kdims, vdims=vdims, extents=extents, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/core/dimension.py", line 839, in __init__
    super(Dimensioned, self).__init__(data, **params)
  File "/Users/danielstephan/python/holoviews/holoviews/core/dimension.py", line 560, in __init__
    super(LabelledData, self).__init__(**params)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 1041, in __init__
    self._setup_params(**params)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 978, in override_initialization
    fn(parameterized_instance,*args,**kw)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 1441, in _setup_params
    setattr(self,name,val)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/__init__.py", line 615, in __set__
    super(Number,self).__set__(obj,val)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/__init__.py", line 452, in __set__
    if not obj: self._set_instantiate(dynamic)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 555, in __nonzero__
    return self.interface.nonzero(self)
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/interface.py", line 351, in nonzero
    return bool(cls.length(dataset))
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/xarray.py", line 432, in length
    return np.product([len(dataset.data[d.name]) for d in dataset.kdims])
  File "/Users/danielstephan/python/holoviews/holoviews/core/data/xarray.py", line 432, in <listcomp>
    return np.product([len(dataset.data[d.name]) for d in dataset.kdims])
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 873, in __getitem__
    return self._construct_dataarray(key)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 782, in _construct_dataarray
    self._variables, name, self._level_coords, self.dims)
  File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 72, in _get_virtual_variable
    ref_var = variables[ref_name]
KeyError: 'x'

This does not happen in version 1.9.5

jlstevens commented 6 years ago

This looks like the critical param bug that is in the process of being fixed. This will involve a new param release (hopefully today!) and the fix should simply be to install this new version (which will also be required by 1.10).

philippjfr commented 6 years ago

Thanks for getting back to us so quickly @drs251, that is indeed related to param. I'm actually very glad to hear that this doesn't affect 1.9.5 because I had thought this was broken for everyone.

philippjfr commented 4 years ago

I think all the issues here were addressed, if I'm mistaken please reopen a more specific issue.

github-actions[bot] commented 3 weeks ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.