holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.69k stars 402 forks source link

(Possibly) wrong way for determination about X- and Y- axis in hv.Image #6327

Open arafune opened 2 months ago

arafune commented 2 months ago

Thanks for contacting us! Please read and follow these instructions carefully, then delete this introductory text to keep your issue easy to read. Note that the issue tracker is NOT the place for usage questions and technical assistance; post those at Discourse instead. Issues without the required information below may be closed immediately.

ALL software version info

Python 3.11 holoviews 1.19.1 Bokeh 3.5.0

Description of expected behavior and the observed behavior

Expect the same output, but actually not.

I had DataArrays that worked well in my script and others that did not, and for a long time, I couldn't understand why. Finally, I have figured it out. I consider this a rather serious issue and would like to hear your opinion.

Suppose the following two (essentially) identical xarray.DataArrays:

import numpy as np
import holoviews as hv
import xarray as xr

hv.extension("bokeh")
matrixdata = np.arange(24.0).reshape((4,6))
x_axis = np.linspace(0,5, 4) # [0,, 1.6666, 3.3333, 5]
y_axis = np.linspace(1, 8, 6) #[1. , 2.4, 3.8, 5.2, 6.6, 8. ]

dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})

I believe that hv.Image(dataarray_a) and hv.Image(dataarray_b) should be identical because both have the same "dims", which is a tuple. But they actually are not.

The horizontal axis of hv.Image(dataarray_a) is "X", while that of hv.Image(dataarray_b) is "Y".

As the items in a dict object may be (essentially) ordered. I know the current Python dict is, but I believe HoloViews should not rely on it...

Is this the holoviews specifiacation? If yes, why was this choice made? And let me know to avoid the problem (In other wards, please let me know how to achieve the same output).

arafune commented 2 months ago

hv.QuadMesh has same problem.

Further,

While dataarray_a.plot() and daarray_b.plot() show the same ouptuts,

hv.extension("matplotlib")
hv.Image(dataarray_a)
hv.Image(dataarray_b)

shows different results.

jbednar commented 2 months ago

Thanks for pointing out this behavior. In your opinion, what should be the determining factor about what goes on the horizontal axis and what goes on the vertical? Should it be the order of the dims, the order of the coords, or some other convention?

arafune commented 2 months ago

Thanks for pointing out this behavior. In your opinion, what should be the determining factor about what goes on the horizontal axis and what goes on the vertical? Should it be the order of the dims, the order of the coords, or some other convention?

I believe it should follow the order of the dims, which is a sequence object. If the dims of the xarray.DataArray were a set like object, more discussion and consideration might be needed. However, as it is a tuple base, we do not think there is much room for discussion.

arafune commented 2 months ago

I feel that the problem would be that there is no policy for xarrays to plot by using holoviews. I think it is important to determine what that policy is.

Personally, I think the simplest and most robust way to make holoviews robust is to make it impossible to plot Datasets directly (only DataArray can be used.)

Enforcing the rule that users cannot pass Datasets as arguments may be seen as reducing convenience. However, by limiting it to DataArray, it can be used without the need to specify many optional arguments.

Please think about it.

jbednar commented 1 month ago

As the items in a dict object may be (essentially) ordered. I know the current Python dict is, but I believe HoloViews should not rely on it...

Python dicts have been ordered for the past sixteen years, and I am not aware of any proposal to ever make them not be ordered again, so I believe it is entirely valid to rely on their ordering when appropriate.

the simplest and most robust way to make holoviews robust is to make it impossible to plot Datasets directly (only DataArray can be used.)

I personally consider plotting Datasets to be useful, so I'd be against removing that capability in any case, but more importantly it's existing capability that would cause problems for existing workflows if we removed it, so I'm -1 on this suggestion.

I feel that the problem would be that there is no policy for xarrays to plot by using holoviews. I think it is important to determine what that policy is.

So far we have tried to follow what Xarray and the CF conventions that are used with Xarray specify, as you can see in previous issues raised about this topic where we changed behavior to what it is presently. But I do not think that our documentation clearly lays out that policy and the conventions we respect, and so I think we need to review the current behavior, see if we continue to agree that it is behaving as intended, and document that more clearly. Certainly finding behavior for hv.Image that differs from da.plot is something we need to investigate and triple-check. In the meantime, I do not think that https://github.com/holoviz/holoviews/issues/6327 should be merged.

arafune commented 1 month ago

@jbednar

I understand your point of view. I'm just a very novice user of holoviews, and I didn't know the history of this library. Thus, it might be natural to ignore my request.
However, I still wondering why coords object is used to determine the X- and Y- axis.

Furthermore, though I know the current situation about the dict object in python about the order, I don't think we should rely on it. The dict object is not oriented for the ordered process, and coords object also is not taken care about the order.

(While I don't know the development history, this might be discussed before. ) I can show another reason why the coords object is not suitable to determine the X- and Y- axis.

The coords object can take the values that is not match with the value's shape. So I always must prepare the private function as follows:  

def _fix_holoviews_about_xarray(dataarray: xr.DataArray) -> xr.DataArray:
    """Helper function to overcome the problem in holoviews.

    Args:
        dataarray (xr.DataArray): input Dataarray

    Returns:
        xr.DataArray, whose coordinates is regularly orderd determined by dataarray.dims.
    """
    for coord_name in dataarray.coords:
        if coord_name not in dataarray.dims:
            dataarray = dataarray.drop_vars(str(coord_name))
    return dataarray.assign_coords(
        coords={dim_name: dataarray.coords[dim_name] for dim_name in dataarray.dims},
    )

(Acutally, I found this #6327 issue during resolving #6317 issue. While I can propose the PR to resolve #6317 issue, I believed I must propose the PR for #6327, before it. That's why I send PR #6333.)

As you can see, this is not perfectly same problem of the current issue. But I think these problems have the same root.

arafune commented 1 month ago

Triple check might also be important. However, I believe that it is better to first try to fix the problem and then consider what to do about it than to check and not fix it, which will lead to active development.

Needless to say, this is my own idea. You can freely ignore this.

Thank you for your consideration and discussion.

ahuang11 commented 1 month ago

As the items in a dict object may be (essentially) ordered. I know the current Python dict is, but I believe HoloViews should not rely on it...

The coords object can take the values that is not match with the value's shape.

I think I've encountered this before (coords mismatching dims ordering/shape)

Here, the coords are ordered: lat, lon, time

image

While the DataArray is ordered time, lat, lon

image

But I'm not sure there's an elegant solution because each data var might be transposed differently. Here I fake it:

image
arafune commented 1 month ago

@ahuang11 I also feel there is no "elegant" solution.

However, there is a solution for keeping the holoviews robust. -- (as I said) to make it impossible to plot Datasets directly and use dims for determining the X- and Y- axis, not coords.

Actually only using dims to determine axis can not solve the problem completely. I believe the problem should be solved by applying my PR when the dataarray is used. However, for the dataset, the problem can not completely solved because dataset can have several dataarray objects, whose coords can be inconsistent order.

In my opinion, the library should work w/o error as possible is the first, in general. If the users know the inside of the holoviews like you or check the code in detail , they can make a solution, it may be ad-hoc, but the solution is solution. However, if not so? For me it took several hours to find the root of the problem.

That is why I think the best way is to make it impossible to plot Dataset directly. It is not good to make it look as if a figure can be drawn without any specification in any case, when in fact it cannot always be done. I repeat, by applying my PR, for the dataarray object, such operation can work.

Or another way is that only in the case of Dataset, it should generate an error/message/warning if kdim is not specified. For data array this is not needed if dims is used to determine the X- and Y- axis. Still I'm really wondering coords is used for determining it...

ahuang11 commented 1 month ago

Vanilla xarray plot has interesting behavior; although dims are x, y I think it transposes it.

import numpy as np
import holoviews as hv
import xarray as xr
import panel as pn

hv.extension("bokeh")
matrixdata = np.arange(24.0).reshape((4,6))
x_axis = np.linspace(0,5, 4) # [0,, 1.6666, 3.3333, 5]
y_axis = np.linspace(1, 8, 6) #[1. , 2.4, 3.8, 5.2, 6.6, 8. ]

dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})

dataarray_a.plot()
dataarray_b.plot()
image

In my opinion, the library should work w/o error as possible is the first, in general. Or another way is that only in the case of Dataset, it should generate an error/message/warning if kdim is not specified.

I don't think this errors out.

dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})
ds = xr.Dataset({"A": dataarray_a, "B": dataarray_b})
hv.Image(ds)

However, if not so? For me it took several hours to find the root of the problem.

Can you share the dataset that caused you an issue?

arafune commented 1 month ago

Here, I would like to point out that both dataarray_a.plot() and dataarray_b.plot() is same.


Suppose that:

import numpy as np
import holoviews as hv
import xarray as xr

hv.extension("bokeh")
matrixdata = np.arange(24.0).reshape((4,6))
matrixdata_c = np.arange(54.0).reshape((6,9))
x_axis = np.linspace(0,5, 4) # [0,, 1.6666, 3.3333, 5]
y_axis = np.linspace(1, 8, 6) #[1. , 2.4, 3.8, 5.2, 6.6, 8. ]
w_axis = np.linspace(3, 8, 9) #[3.   , 3.625, 4.25 , 4.875, 5.5  , 6.125, 6.75 , 7.375, 8.  ]

dataarray_a = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"X": x_axis, "Y": y_axis})
dataarray_b = xr.DataArray(matrixdata, dims=["X", "Y"], coords={"Y": y_axis, "X": x_axis})
dataarray_c = xr.DataArray(matrixdata_c, dims=["Y", "W"], coords={"Y": y_axis, "W": w_axis})

ds_abc = xr.Dataset({"A": dataarray_a, "B": dataarray_b, "C":dataarray_c})
hv.Image(ds_abc)

does not work.

To begin with, it is really ambiguous what hv.Image(ds_abc) should show.

And in the current version,

hv.Image(ds_abc, kdims=["Y", "W"])

got error.

and we could not get what happened fro the message: It said:

DictInterface expects tabular data, for more information on supported datatypes see https://holoviews.org/user_guide/Tabular_Datasets.html because our data is tabular.

But the dataset is tabular! Eventually, I found that

hv.Image(ds_abc, kdims=["Y", "W"], vdim="C")

works. (note that hv.Image(ds_abc, vdims =["C"] does not work, and the error message is weird, at least for me. I couldn't understand what I must chage.)) So that's why I said, the robust way (to prevent such confusion) is

arafune commented 1 month ago

I said:

To begin with, it is really ambiguous what hv.Image(ds_abc) should show.

Suppose that

ds2 = xr.Dataset({"B": dataarray_b, "A": dataarray_a})

This is "essentially" identical dataset object with ds. But hv.Image(ds) and hv.Image(ds2) shows different result, which is weird at least for me.

I believe that the users usually don't care about the order of the dataarray objects when build the Dataset, because it can set by Mappable object. (I mean that this is not only dict object. If the Mappable object is used, it would be accepted even if it does not have the information about the order.)

arafune commented 1 month ago

@jbednar

Python dicts have been ordered for the past sixteen years, and I am not aware of any proposal to ever make them not be ordered again, so I believe it is entirely valid to rely on their ordering when appropriate.

Initially, I felt that your comment was not well justified but I could not get the point at that moment. Now I believe I can understand the better.

When we set the coords of xr.DataArray, the dict object is not necessary. Mappable object is sufficient. Of course, the dict object is the representative Mappable object. However, we can use another custom Mappable object to build the coords.

arafune commented 1 month ago

For the further information:

https://docs.xarray.dev/en/stable/generated/xarray.Coordinates.html#xarray.Coordinates

coords (dict-like, optional) – Mapping where keys are coordinate names and values are objects that can be converted into a Variable object (see as_variable()). If another Coordinates object is passed, its indexes will be added to the new created object.

The point is dict-like, not dict.

Could you let me know your thinking?

jbednar commented 1 month ago

So the argument here is that coords isn't in fact a dict type, but a collections.abc.Mapping type, and mappings might not be ordered? That's an issue; if xarray accepts (and is in some normal circumstance is given) a non-ordered mapping type, then the ordering would be poorly defined. All the mapping types listed at https://docs.python.org/3/glossary.html#term-mapping are ordered, but I can't find a definitive statement about either whether all mapping types are guaranteed to be ordered (I suspect not) or whether xarray expects the mapping type to be ordered.

In any case, having behavior different from .plot() is indeed suspicious, because it suggests we are relying on different conventions than others use for interpreting Xarray objects. That's what we need to investigate. Until such investigation, I don't think the PR should be merged; I have no idea if that's the right approach, and do not want to change behavior lightly. If you can point us to definitive docs in Xarray about how the dimensions/variables/coordinates should be interpreted, that would be helpful; otherwise we have to go by practice.

BTW, have you tried .hvplot (using hvPlot)? Does it differ from .plot()? If so that's a very clear issue.

arafune commented 1 month ago

whether all mapping types are guaranteed to be ordered (I suspect not)

I believe it clearly is not.

whether xarray expects the mapping type to be ordered.

I believe it is not, because there is no need to expect such a thing

I also feel that the behavior different from .plot() is indeed suspicious. On the other hand I would like to point out the more important thing: dataarray_a.plot() and daarray_b.plot() output the same result. While the output may be different from the holoviews's one, it is consistent and this consistency originate from the X- and Y- axes determined by the dims object, not coords. I think if the output of holoviews is consistent, it can be accepted, even if it is different from the xarray's one.

I'm not saying we should change it lightly. I'm saying that using coord for determining X- and Y- axis essentially isn't rational.

I haven't tried hvplot about this problem so deeply. I think hvplot is the upper layer of the holoviews. And by using hvplot alone is not sufficient for what I would like to do.

Please think it more deeply.

ahuang11 commented 1 month ago

Investigated it deeper, and there seems to be a lot of inconsistencies between xarray, hvplot, and HoloViews default plotting; results here: https://anaconda.cloud/share/notebooks/6e148e19-aafa-44f7-b718-04dbbe0dee9a/preview

xarray plotting docs AFAIK doesn't mention much of its conventions used https://docs.xarray.dev/en/latest/user-guide/plotting.html

Digging deeper, docstring mentions

    x : Hashable or None, optional
        Coordinate for *x* axis. If ``None``, use ``darray.dims[1]``.
    y : Hashable or None, optional
        Coordinate for *y* axis. If ``None``, use ``darray.dims[0]``.

Here's how the code for doing so https://github.com/pydata/xarray/blob/main/xarray/plot/utils.py#L376-L412

arafune commented 1 month ago

The issue should not be closed, because it has not been resolved at al by #6351 .

arafune commented 1 month ago

Could you let me know how to reopen this?

hoxbro commented 1 month ago

It was automatically closed by merging the PR.

arafune commented 1 month ago

My argument (that dim should be used instead of coords), turned out to be correct.

https://github.com/pydata/xarray/discussions/9295

I think you should rethink how you handle Dataset before you create further confusion.

BTW, I looked at the codes, but I don't fully understand what packed means in xarray.py. Can you give me an example of an xarray where packed == True?

I'm afraid you would not accept any PR that cannot pass fully your tests. As the modification of the code to fix the problem would be deep level, I feel that I need to modify the code from the very deep level to resolve this issue.