pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.54k stars 1.06k forks source link

DataArrays should display their coordinates in the natural order #712

Open anntzer opened 8 years ago

anntzer commented 8 years ago

Consider

from collections import *
import numpy as np
from xray import *

d1 = DataArray(np.empty((2, 2)), coords=OrderedDict([("foo", [0, 1]), ("bar", [0, 1])]))
d2 = DataArray(np.empty((2, 2)), coords=OrderedDict([("bar", [0, 1]), ("foo", [0, 1])]))

ds = Dataset({"d1": d1, "d2": d2})

print(ds.d1)
print(ds.d2)

This outputs

<xray.DataArray 'd1' (foo: 2, bar: 2)>
array([[  6.91516848e-310,   1.64244654e-316],
       [  6.91516881e-310,   6.91516881e-310]])
Coordinates:
  * foo      (foo) int64 0 1
  * bar      (bar) int64 0 1
<xray.DataArray 'd2' (bar: 2, foo: 2)>
array([[  1.59987863e-316,   6.91516883e-310],
       [  6.91515690e-310,   2.12670320e-316]])
Coordinates:
  * foo      (foo) int64 0 1
  * bar      (bar) int64 0 1

I understand that internally both DataArrays use the same coords object and thus the same coords order, but it would be helpful if, when printing d2 by itself, the coordinates were printed in the natural order ("bar", "foo"). In particular, when working interactively, the list of coordinates at the end of the repr is the most easy thing to spot, and thus most helpful to know how to format the call to array.loc[...].

shoyer commented 8 years ago

I think this may have been fixed by the recent rewrite of DataArray internals. On master, I have:

In [2]: d1
Out[2]:
<xray.DataArray (foo: 2, bar: 2)>
array([[  0.00000000e+000,   0.00000000e+000],
       [  2.15725662e-314,   2.15893204e-314]])
Coordinates:
  * foo      (foo) int64 0 1
  * bar      (bar) int64 0 1

In [3]: d2
Out[3]:
<xray.DataArray (bar: 2, foo: 2)>
array([[  0.00000000e+000,   0.00000000e+000],
       [  2.15906985e-314,   2.14458868e-314]])
Coordinates:
  * bar      (bar) int64 0 1
  * foo      (foo) int64 0 1
anntzer commented 8 years ago

Awesome, thanks. Any plans for a release soon? Feel free to close the issue.

shoyer commented 8 years ago

yes, in the next week, hopefully.

shoyer commented 8 years ago

This should be fixed in v0.7.0... please reopen if it resurfaces.

anntzer commented 8 years ago

Requesting a reopen: this issue is present again in 0.7.2.

shoyer commented 8 years ago

OK, I didn't read your first post carefully last time. Your complaint was about the order of coordinates in ds.d1 and ds.d2, not the original DataArrays. So this is a more subtle issue than I thought.

We could add some sort of ad-hoc adjustment to the order in which we display coordinates, but I'm reluctant because it's not obvious to me what that "correct" order would be. For example, that you can directly supply the coords argument as a mapping with any arbitrary order to construct a DataArray.

I suppose once principled choice would always be to display coordinates corresponding to dimensions first in lists of coordinates, and to always display them in the same order as dimensions. If we do this, it should be consistent between both DataArray and Dataset.

stale[bot] commented 5 years ago

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity If this issue remains relevant, please comment here; otherwise it will be marked as closed automatically

anntzer commented 5 years ago

The issue is still relevant.

For the record, the repro code is now (e.g.)

In [4]: from collections import * 
   ...: import numpy as np 
   ...: from xarray import * 
   ...:  
   ...: d1 = DataArray(np.empty((2, 2)), coords=OrderedDict([("foo", [0, 1]), ("bar", [0, 1])]), dims=["foo", "bar"]) 
   ...: d2 = DataArray(np.empty((2, 2)), coords=OrderedDict([("bar", [0, 1]), ("foo", [0, 1])]), dims=["bar", "foo"]) 
   ...:  
   ...: ds = Dataset({"d1": d1, "d2": d2}) 
   ...:  
   ...: print(ds.d1) 
   ...: print(ds.d2)                                                                                                                                                                                                                        
<xarray.DataArray 'd1' (foo: 2, bar: 2)>
array([[4.665651e-310, 0.000000e+000],
       [4.940656e-324,           nan]])
Coordinates:
  * foo      (foo) int64 0 1
  * bar      (bar) int64 0 1
<xarray.DataArray 'd2' (bar: 2, foo: 2)>
array([[4.66565e-310, 0.00000e+000],
       [4.94066e-324,          nan]])
Coordinates:
  * foo      (foo) int64 0 1
  * bar      (bar) int64 0 1
jhamman commented 5 years ago

@anntzer - would you be interested in working on this?

anntzer commented 5 years ago

I don't know anything about the internals of xarray, and to be honest I rarely use it anymore. The issue remains valid (which is why I posted the reply above), but it's not going to be the end of the world if you close it as wontfix.

keewis commented 3 years ago

what should we do about this? We did touch the subject in #4409, but decided to keep the order the coordinates were passed in rather than sorting by dimension (or alphabetically). I think there's a lot of confusion about the difference between the dimensions in the summary line of DataArray objects and the order in the coordinates section.

A fix for #4515 might make sorting by dimension order much more important.

dcherian commented 3 years ago

4515 is consistent with this comment up above:

display coordinates corresponding to dimensions first in lists of coordinates, and to always display them in the same order as dimensions.

keewis commented 3 years ago

true, it seems I didn't read this issue carefully enough