Closed drs251 closed 4 years ago
I think it would be pretty straightforward to add this feature in the holoviews importing and exporting code, I could look into it.
Agreed this should be pretty straightforward, this is where you'd implement that.
When importing a netcdf file via xarray and passing it to hv.Dataset, the key dimensions are sometimes ordered differently, depending whether the file is imported as a xr.DataArrayor xr.Dataset.
Agreed, this should be consistent and should be handled in the same place as the previous issue.
I've never had any luck converting the data storage backend from 'xarray' to anything else, writing img.clone(datatype=['image']),
Not all conversions are currently supported, generally we only support converting the standard formats, i.e. dictionary based representations, but full ability to convert between all formats would be a good goal to shoot for.
I've never had any luck converting the data storage backend from 'xarray' to anything else
Actually a bit of code in Dataset.clone could probably detect if the current datatype is in the list of datatypes that the user supplied and if not just drop down to a generic format. That will take care of almost all conversions I think.
import holoviews as hv
import xarray as xr
import numpy as np
hv.notebook_extension()
%opts Image [colorbar=True]
xs = np.linspace(0, 2.5, 51)
ys = np.linspace(0, 1, 21)
zs = np.random.random((21, 51))
arr = xr.DataArray(zs, coords=[("y", ys), ("x", xs)], name="field")
x_dim = hv.Dimension("x", unit="m")
y_dim = hv.Dimension("y", unit="m")
z_dim = hv.Dimension("field", unit="mT")
dataset = hv.Dataset(arr, kdims=[x_dim, y_dim], vdims=[z_dim])
dataset.to(hv.Image)
The unit mT does not show up next to the colorbar.
More generally it would be nice to have improved support for changing dimension names, labels and units, not just for xarray related stuff - the .redim
utility seems to be unable to change existing labels and units, which IMO is frustrating sometimes.
It seems like your latest changes to xarray overwrite existing units on the dimensions.
the .redim utility seems to be unable to change existing labels and units, which IMO is frustrating sometimes.
Concrete examples would be appreciated, I'm not currently aware of any issues with .redim
atm.
It seems like your latest changes to xarray overwrite existing units on the dimensions.
That's a point I did not think about previously (I'm still using 1.9.5, not sure if it's a good idea to use the current master for everyday work). Ideally, specified key and value dimensions should override anything thats present in the xarray, but I haven't checked if that's the case - I'll look into it, perhaps another PR is needed.
Concrete examples would be appreciated, I'm not currently aware of any issues with .redim atm.
xdim = hv.Dimension("x", label="my x dimension", unit="mJ")
ydim = hv.Dimension("y", label="my y dimension", unit="kB")
zdim = hv.Dimension("z", label="my z dimension", unit="fs")
image = hv.Dataset(([0, 1], [1, 2], [[3, 4], [5, 6]]), kdims=[xdim, ydim], vdims=[zdim]).to(hv.Image)
image = image.redim(z='time')
image = image.redim.unit(x='eV', y='TB')
image = image.redim.label(x='Energy', y='Capacity')
Actually, I saw that my only issue with redim is that it refuses to change existing labels - everything else works like I would expect it to. So as far as I know, if you come across a dimension label you don't like, the only option is to extract the dimension values and rebuild the element from scratch. I would prefer if redim worked the same way on labels as it does on units and dimension names.
Tbh I agree, @jlstevens you originally objected to being able to override the label, but imo this is an annoying and artificial restriction. What was there some convincing reasoning?
The name
and label
define the identity of a dimension (e.g used to compare dimensions). As such the name and label should explicitly be declared correctly.
And using .redim isn't explicit enough? You can already override the dimension as a whole using redim so I really don't see what is gained by disallowing .redim.label
except for making things inconsistent and more verbose/difficult for the user.
Explicitly declared.
From my perspective, the label
is what gets shown in the plot, and name
is what you'd use to refer to the dimension programmatically, e.g. when using .reduce
. I could understand if name
were immutable and you could change around the label as you like, but I don't get why it's the other way around currently.
I actually agree with that @drs251 . Changing the dimension name was always a misfeature imho and I think it was introduced before label
existed. Unfortunately, disabling the ability to change the name would now break backwards compatibility.
Explicitly declared.
My point was that you can already override both name and label using .redim
so I really don't see why changing the label should not be supported as well.
My point was that you can already override both name and label using .redim so I really don't see why changing the label should not be supported as well.
I suppose so. The intended design is already broken.
Changing the dimension name was always a misfeature imho and I think it was introduced before label existed.
It's absolutely not a misfeature as it's crucial for certain operations, e.g. if we ever want to implement a melt operation and it's already used for similar operations in places.
I suppose so. The intended design is already broken.
How can you say that when completely replacing a dimension is an essential operation that absolutely must be supported.
How can you say that when completely replacing a dimension is an essential operation that absolutely must be supported.
For operations perhaps, but for changing the title on a plot? All it does is make things more difficult to reason about. Do you really think the semantics of a dimension change because you want a different title?
For operations perhaps, but for changing the title on a plot?
For changing the title of a plot axis you indeed shouldn't change the whole dimension, especially the name, which is precisely why disallowing changing the label easily is such a pain. Instead of simply using .redim.label
to set a new label a user will either have to replace the entire dimension or simply rename it, which is the worst possible option but also happens to be the most convenient. A user who simply wants to assign a new axis label gets an error when (s)he tries .redim.label(x='My custom label')
and simply ends up using .redim(x='My custom label')
because it works, but is actually infinitely worse from your perspective, because it replaces the entire dimension instead of just the label.
For operations perhaps
But we don't have distinct mechanisms to do this inside an operation, so are you suggesting introducing a whole other API? There's many reasons to change the names of dimensions so a user should be able to do it somehow that doesn't require knowledge about the type of data, redim
has been the API to do that for a long time.
Really, the point is that the title should be set as plot option, not by changing the dimension definition. What the dimension is has nothing to do with the title you want. Does your data fundamentally change in semantics because you want to tweak a title?
A user who simply wants to assign a new axis label gets an error when (s)he tries .redim.label(x='My custom label') and simply ends up using .redim(x='My custom label') because it works, but is actually infinitely worse from your perspective.
This I do agree with at least (as I mentioned above) which is a reason to allow the label to be changed. The entire idea of changing the core identity of a dimension to change a title is the real problem here.
.to
interface seems to be broken for datasets generated from xarrays on current master, not sure if it's related to my recent changes (but a test is definitely needed for this).I also just noticed that the .to interface seems to be broken for datasets generated from xarrays on current master, not sure if it's related to my recent changes (but a test is definitely needed for this).
Could you please provide an example? There are extensive unit tests for .to and .groupby which are all passing. If it is failing that is a critical bug and needs to be fixed before a release tomorrow.
import holoviews as hv
import xarray as xr
import numpy as np
xs = [0.1, 0.2, 0.3]
ys = [0, 1]
zs = np.array([[0, 1], [2, 3], [4, 5]])
da = xr.DataArray(zs, coords=[('x_dim', xs), ('y_dim', ys)], name="data_name", dims=['y_dim', 'x_dim'])
da.attrs['long_name'] = "data long name"
da.attrs['units'] = "array_unit"
da.x_dim.attrs['units'] = "x_unit"
da.y_dim.attrs['long_name'] = "y axis long name"
dataset = hv.Dataset(da)
image = dataset.to(hv.Image)
I get
Traceback (most recent call last):
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 779, in _construct_dataarray
variable = self._variables[name]
KeyError: 'x'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/danielstephan/python/holoviews/test.py", line 14, in <module>
image = dataset.to(hv.Image)
File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 143, in __call__
element = new_type(selected, **params)
File "/Users/danielstephan/python/holoviews/holoviews/element/raster.py", line 256, in __init__
Dataset.__init__(self, data, kdims=kdims, vdims=vdims, extents=extents, **params)
File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 199, in __init__
super(Dataset, self).__init__(data, **dict(kwargs, **dict(dims, **extra_kws)))
File "/Users/danielstephan/python/holoviews/holoviews/element/raster.py", line 50, in __init__
super(Raster, self).__init__(data, kdims=kdims, vdims=vdims, extents=extents, **params)
File "/Users/danielstephan/python/holoviews/holoviews/core/dimension.py", line 839, in __init__
super(Dimensioned, self).__init__(data, **params)
File "/Users/danielstephan/python/holoviews/holoviews/core/dimension.py", line 560, in __init__
super(LabelledData, self).__init__(**params)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 1041, in __init__
self._setup_params(**params)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 978, in override_initialization
fn(parameterized_instance,*args,**kw)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/parameterized.py", line 1441, in _setup_params
setattr(self,name,val)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/__init__.py", line 615, in __set__
super(Number,self).__set__(obj,val)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/param/__init__.py", line 452, in __set__
if not obj: self._set_instantiate(dynamic)
File "/Users/danielstephan/python/holoviews/holoviews/core/data/__init__.py", line 555, in __nonzero__
return self.interface.nonzero(self)
File "/Users/danielstephan/python/holoviews/holoviews/core/data/interface.py", line 351, in nonzero
return bool(cls.length(dataset))
File "/Users/danielstephan/python/holoviews/holoviews/core/data/xarray.py", line 432, in length
return np.product([len(dataset.data[d.name]) for d in dataset.kdims])
File "/Users/danielstephan/python/holoviews/holoviews/core/data/xarray.py", line 432, in <listcomp>
return np.product([len(dataset.data[d.name]) for d in dataset.kdims])
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 873, in __getitem__
return self._construct_dataarray(key)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 782, in _construct_dataarray
self._variables, name, self._level_coords, self.dims)
File "/Users/danielstephan/anaconda3/envs/holoviews_dev/lib/python3.6/site-packages/xarray/core/dataset.py", line 72, in _get_virtual_variable
ref_var = variables[ref_name]
KeyError: 'x'
This does not happen in version 1.9.5
This looks like the critical param bug that is in the process of being fixed. This will involve a new param release (hopefully today!) and the fix should simply be to install this new version (which will also be required by 1.10).
Thanks for getting back to us so quickly @drs251, that is indeed related to param. I'm actually very glad to hear that this doesn't affect 1.9.5 because I had thought this was broken for everyone.
I think all the issues here were addressed, if I'm mistaken please reopen a more specific issue.
This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.
I noticed some points concerning xarray support in holoviews which could be improved:
[x] Both xarray and holoviews support measurements units, but it appears that they are lost during in- and exporting. I think it would be pretty straightforward to add this feature in the holoviews importing and exporting code, I could look into it.
[x] When importing a netcdf file via xarray and passing it to
hv.Dataset
, the key dimensions are sometimes ordered differently, depending whether the file is imported as axr.DataArray
orxr.Dataset
. This is not a gigantic problem, especially since one can manually specify the key dimensions' order, although it is still pretty annoying when dealing with images... EDIT: (actually, this seems to have been fixed already sometime between 1.9.3 and the current master)[x] I've never had any luck converting the data storage backend from 'xarray' to anything else, writing
img.clone(datatype=['image'])
, as shown in the gridded datasets guide: http://holoviews.org/user_guide/Gridded_Datasets.html Perhaps it would be nice to show this more explicitly in the documentation, or fix it if it is a bug.EDIT:
[x] I've noticed that dimensions without coordinates are not accepted by
XArrayInterface
- although omitting coordinates seems to be the recommended way in xarray to deal with dimensions without meaningful coordinates (such as pixels in an image). In this case, it should be possible to just automatically assign simple range coordinates to these dimensions when making a holoviews Dataset.[ ] The treatment of non-dimension coordinates could be more concise. Take for example the "rasm" example that comes with xarray, which looks like this
Right now, coordinates are converted into kdims, and dimensions without coordinates are ignored. IMO, it would be better to let the user select if they want (time, y, c) or (time, yc, xc) coordinates when making the holoviews Dataset, where the default should be the proper dimensions.
[x] The xarray interface in holoviews could also benefit from more documentation in general - i.e. an example of how to load an example file from xarray into holoviews and make a plot. Introducing xarray support by cloning from a different datatype does not seem very intuitive to me.