Open arafune opened 4 months ago
hv.QuadMesh has same problem.
Further,
While dataarray_a.plot()
and daarray_b.plot()
show the same ouptuts,
hv.extension("matplotlib")
hv.Image(dataarray_a)
hv.Image(dataarray_b)
shows different results.
Thanks for pointing out this behavior. In your opinion, what should be the determining factor about what goes on the horizontal axis and what goes on the vertical? Should it be the order of the dims, the order of the coords, or some other convention?
Thanks for pointing out this behavior. In your opinion, what should be the determining factor about what goes on the horizontal axis and what goes on the vertical? Should it be the order of the dims, the order of the coords, or some other convention?
I believe it should follow the order of the dims, which is a sequence object. If the dims of the xarray.DataArray were a set like object, more discussion and consideration might be needed. However, as it is a tuple base, we do not think there is much room for discussion.
I feel that the problem would be that there is no policy for xarrays to plot by using holoviews. I think it is important to determine what that policy is.
Personally, I think the simplest and most robust way to make holoviews robust is to make it impossible to plot Datasets directly (only DataArray can be used.)
Enforcing the rule that users cannot pass Datasets as arguments may be seen as reducing convenience. However, by limiting it to DataArray, it can be used without the need to specify many optional arguments.
Please think about it.
As the items in a dict object may be (essentially) ordered. I know the current Python dict is, but I believe HoloViews should not rely on it...
Python dicts have been ordered for the past sixteen years, and I am not aware of any proposal to ever make them not be ordered again, so I believe it is entirely valid to rely on their ordering when appropriate.
the simplest and most robust way to make holoviews robust is to make it impossible to plot Datasets directly (only DataArray can be used.)
I personally consider plotting Datasets to be useful, so I'd be against removing that capability in any case, but more importantly it's existing capability that would cause problems for existing workflows if we removed it, so I'm -1 on this suggestion.
I feel that the problem would be that there is no policy for xarrays to plot by using holoviews. I think it is important to determine what that policy is.
So far we have tried to follow what Xarray and the CF conventions that are used with Xarray specify, as you can see in previous issues raised about this topic where we changed behavior to what it is presently. But I do not think that our documentation clearly lays out that policy and the conventions we respect, and so I think we need to review the current behavior, see if we continue to agree that it is behaving as intended, and document that more clearly. Certainly finding behavior for hv.Image that differs from da.plot is something we need to investigate and triple-check. In the meantime, I do not think that https://github.com/holoviz/holoviews/issues/6327 should be merged.
@jbednar
I understand your point of view. I'm just a very novice user of holoviews, and I didn't know the history of this library. Thus, it might be natural to ignore my request.
However, I still wondering why coords object is used to determine the X- and Y- axis.
Furthermore, though I know the current situation about the dict object in python about the order, I don't think we should rely on it. The dict object is not oriented for the ordered process, and coords object also is not taken care about the order.
(While I don't know the development history, this might be discussed before. ) I can show another reason why the coords object is not suitable to determine the X- and Y- axis.
The coords object can take the values that is not match with the value's shape. So I always must prepare the private function as follows:
def _fix_holoviews_about_xarray(dataarray: xr.DataArray) -> xr.DataArray:
"""Helper function to overcome the problem in holoviews.
Args:
dataarray (xr.DataArray): input Dataarray
Returns:
xr.DataArray, whose coordinates is regularly orderd determined by dataarray.dims.
"""
for coord_name in dataarray.coords:
if coord_name not in dataarray.dims:
dataarray = dataarray.drop_vars(str(coord_name))
return dataarray.assign_coords(
coords={dim_name: dataarray.coords[dim_name] for dim_name in dataarray.dims},
)
(Acutally, I found this #6327 issue during resolving #6317 issue. While I can propose the PR to resolve #6317 issue, I believed I must propose the PR for #6327, before it. That's why I send PR #6333.)
As you can see, this is not perfectly same problem of the current issue. But I think these problems have the same root.
Triple check might also be important. However, I believe that it is better to first try to fix the problem and then consider what to do about it than to check and not fix it, which will lead to active development.
Needless to say, this is my own idea. You can freely ignore this.
Thank you for your consideration and discussion.
As the items in a dict object may be (essentially) ordered. I know the current Python dict is, but I believe HoloViews should not rely on it...
The coords object can take the values that is not match with the value's shape.
I think I've encountered this before (coords mismatching dims ordering/shape)
Here, the coords are ordered: lat, lon, time
While the DataArray is ordered time, lat, lon
But I'm not sure there's an elegant solution because each data var might be transposed differently. Here I fake it:
@ahuang11 I also feel there is no "elegant" solution.
However, there is a solution for keeping the holoviews robust. -- (as I said) to make it impossible to plot Datasets directly and use dims for determining the X- and Y- axis, not coords.
Actually only using dims to determine axis can not solve the problem completely. I believe the problem should be solved by applying my PR when the dataarray is used. However, for the dataset, the problem can not completely solved because dataset can have several dataarray objects, whose coords can be inconsistent order.
In my opinion, the library should work w/o error as possible is the first, in general. If the users know the inside of the holoviews like you or check the code in detail , they can make a solution, it may be ad-hoc, but the solution is solution. However, if not so? For me it took several hours to find the root of the problem.
That is why I think the best way is to make it impossible to plot Dataset directly. It is not good to make it look as if a figure can be drawn without any specification in any case, when in fact it cannot always be done. I repeat, by applying my PR, for the dataarray object, such operation can work.
Or another way is that only in the case of Dataset, it should generate an error/message/warning if kdim is not specified. For data array this is not needed if dims is used to determine the X- and Y- axis. Still I'm really wondering coords is used for determining it...
Vanilla xarray plot has interesting behavior; although dims are x, y I think it transposes it.
import numpy as np
import holoviews as hv
import xarray as xr
import panel as pn
hv.extension("bokeh")
matrixdata = np.arange(24.0).reshape((4,6))
x_axis = np.linspace(0,5, 4) # [0,, 1.6666, 3.3333, 5]
y_axis = np.linspace(1, 8, 6) #[1. , 2.4, 3.8, 5.2, 6.6, 8. ]
dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})
dataarray_a.plot()
dataarray_b.plot()
In my opinion, the library should work w/o error as possible is the first, in general. Or another way is that only in the case of Dataset, it should generate an error/message/warning if kdim is not specified.
I don't think this errors out.
dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})
ds = xr.Dataset({"A": dataarray_a, "B": dataarray_b})
hv.Image(ds)
However, if not so? For me it took several hours to find the root of the problem.
Can you share the dataset that caused you an issue?
Here, I would like to point out that both dataarray_a.plot()
and dataarray_b.plot()
is same.
Suppose that:
import numpy as np
import holoviews as hv
import xarray as xr
hv.extension("bokeh")
matrixdata = np.arange(24.0).reshape((4,6))
matrixdata_c = np.arange(54.0).reshape((6,9))
x_axis = np.linspace(0,5, 4) # [0,, 1.6666, 3.3333, 5]
y_axis = np.linspace(1, 8, 6) #[1. , 2.4, 3.8, 5.2, 6.6, 8. ]
w_axis = np.linspace(3, 8, 9) #[3. , 3.625, 4.25 , 4.875, 5.5 , 6.125, 6.75 , 7.375, 8. ]
dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})
dataarray_c = xr.DataArray(matrixdata_c, dims=["Y", "W"], coords={"Y": y_axis, "W": w_axis})
ds_abc = xr.Dataset({"A": dataarray_a, "B": dataarray_b, "C":dataarray_c})
hv.Image(ds_abc)
does not work.
To begin with, it is really ambiguous what hv.Image(ds_abc) should show.
And in the current version,
hv.Image(ds_abc, kdims=["Y", "W"])
got error.
and we could not get what happened fro the message: It said:
DictInterface expects tabular data, for more information on supported datatypes see https://holoviews.org/user_guide/Tabular_Datasets.html because our data is tabular.
But the dataset is tabular! Eventually, I found that
hv.Image(ds_abc, kdims=["Y", "W"], vdim="C")
works. (note that hv.Image(ds_abc, vdims =["C"]
does not work, and the error message is weird, at least for me. I couldn't understand what I must chage.)) So that's why I said, the robust way (to prevent such confusion) is
I said:
To begin with, it is really ambiguous what hv.Image(ds_abc) should show.
Suppose that
ds2 = xr.Dataset({"B": dataarray_b, "A": dataarray_a})
This is "essentially" identical dataset object with ds
.
But hv.Image(ds)
and hv.Image(ds2)
shows different result, which is weird at least for me.
I believe that the users usually don't care about the order of the dataarray objects when build the Dataset, because it can set by Mappable object. (I mean that this is not only dict object. If the Mappable object is used, it would be accepted even if it does not have the information about the order.)
@jbednar
Python dicts have been ordered for the past sixteen years, and I am not aware of any proposal to ever make them not be ordered again, so I believe it is entirely valid to rely on their ordering when appropriate.
Initially, I felt that your comment was not well justified but I could not get the point at that moment. Now I believe I can understand the better.
When we set the coords
of xr.DataArray
, the dict object is not necessary. Mappable
object is sufficient. Of course, the dict object is the representative Mappable
object. However, we can use another custom Mappable
object to build the coords.
For the further information:
https://docs.xarray.dev/en/stable/generated/xarray.Coordinates.html#xarray.Coordinates
coords (dict-like, optional) – Mapping where keys are coordinate names and values are objects that can be converted into a Variable object (see as_variable()). If another Coordinates object is passed, its indexes will be added to the new created object.
The point is dict-like, not dict.
Could you let me know your thinking?
So the argument here is that coords isn't in fact a dict
type, but a collections.abc.Mapping
type, and mappings might not be ordered? That's an issue; if xarray accepts (and is in some normal circumstance is given) a non-ordered mapping type, then the ordering would be poorly defined. All the mapping types listed at https://docs.python.org/3/glossary.html#term-mapping are ordered, but I can't find a definitive statement about either whether all mapping types are guaranteed to be ordered (I suspect not) or whether xarray expects the mapping type to be ordered.
In any case, having behavior different from .plot()
is indeed suspicious, because it suggests we are relying on different conventions than others use for interpreting Xarray objects. That's what we need to investigate. Until such investigation, I don't think the PR should be merged; I have no idea if that's the right approach, and do not want to change behavior lightly. If you can point us to definitive docs in Xarray about how the dimensions/variables/coordinates should be interpreted, that would be helpful; otherwise we have to go by practice.
BTW, have you tried .hvplot
(using hvPlot)? Does it differ from .plot()
? If so that's a very clear issue.
whether all mapping types are guaranteed to be ordered (I suspect not)
I believe it clearly is not.
whether xarray expects the mapping type to be ordered.
I believe it is not, because there is no need to expect such a thing
I also feel that the behavior different from .plot() is indeed suspicious. On the other hand I would like to point out the more important thing: dataarray_a.plot() and daarray_b.plot() output the same result. While the output may be different from the holoviews's one, it is consistent and this consistency originate from the X- and Y- axes determined by the dims object, not coords. I think if the output of holoviews is consistent, it can be accepted, even if it is different from the xarray's one.
I'm not saying we should change it lightly. I'm saying that using coord for determining X- and Y- axis essentially isn't rational.
I haven't tried hvplot about this problem so deeply. I think hvplot is the upper layer of the holoviews. And by using hvplot alone is not sufficient for what I would like to do.
Please think it more deeply.
Investigated it deeper, and there seems to be a lot of inconsistencies between xarray, hvplot, and HoloViews default plotting; results here: https://anaconda.cloud/share/notebooks/6e148e19-aafa-44f7-b718-04dbbe0dee9a/preview
xarray plotting docs AFAIK doesn't mention much of its conventions used https://docs.xarray.dev/en/latest/user-guide/plotting.html
Digging deeper, docstring mentions
x : Hashable or None, optional
Coordinate for *x* axis. If ``None``, use ``darray.dims[1]``.
y : Hashable or None, optional
Coordinate for *y* axis. If ``None``, use ``darray.dims[0]``.
Here's how the code for doing so https://github.com/pydata/xarray/blob/main/xarray/plot/utils.py#L376-L412
The issue should not be closed, because it has not been resolved at al by #6351 .
Could you let me know how to reopen this?
It was automatically closed by merging the PR.
My argument (that dim should be used instead of coords), turned out to be correct.
https://github.com/pydata/xarray/discussions/9295
I think you should rethink how you handle Dataset before you create further confusion.
BTW, I looked at the codes, but I don't fully understand what packed
means in xarray.py. Can you give me an example of an xarray where packed == True?
I'm afraid you would not accept any PR that cannot pass fully your tests. As the modification of the code to fix the problem would be deep level, I feel that I need to modify the code from the very deep level to resolve this issue.
Thanks for contacting us! Please read and follow these instructions carefully, then delete this introductory text to keep your issue easy to read. Note that the issue tracker is NOT the place for usage questions and technical assistance; post those at Discourse instead. Issues without the required information below may be closed immediately.
ALL software version info
Python 3.11 holoviews 1.19.1 Bokeh 3.5.0
Description of expected behavior and the observed behavior
Expect the same output, but actually not.
I had DataArrays that worked well in my script and others that did not, and for a long time, I couldn't understand why. Finally, I have figured it out. I consider this a rather serious issue and would like to hear your opinion.
Suppose the following two (essentially) identical xarray.DataArrays:
I believe that hv.Image(dataarray_a) and hv.Image(dataarray_b) should be identical because both have the same "dims", which is a tuple. But they actually are not.
The horizontal axis of hv.Image(dataarray_a) is "X", while that of hv.Image(dataarray_b) is "Y".
As the items in a dict object may be (essentially) ordered. I know the current Python dict is, but I believe HoloViews should not rely on it...
Is this the holoviews specifiacation? If yes, why was this choice made? And let me know to avoid the problem (In other wards, please let me know how to achieve the same output).