pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.62k stars 1.08k forks source link

DataTree repr showing inherited coordinates as non-inherited #9499

Closed TomNicholas closed 1 month ago

TomNicholas commented 1 month ago

What is your issue?

The example DataTree under "DataTree Inheritance" on the Data Structures docs page doesn't display the way I expected it to.

It currently looks like this:

Out[104]: 
<xarray.DataTree>
Group: /
│   Dimensions:  (time: 2)
│   Coordinates:
│     * time     (time) <U7 56B '2022-01' '2023-01'
├── Group: /weather
│   │   Dimensions:     (station: 6, time: 2)
│   │   Coordinates:
│   │     * time        (time) <U7 56B '2022-01' '2023-01'
│   │     * station     (station) <U1 24B 'a' 'b' 'c' 'd' 'e' 'f'
│   │   Data variables:
│   │       wind_speed  (time, station) float64 96B 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
│   │       pressure    (time, station) float64 96B 3.0 3.0 3.0 3.0 ... 3.0 3.0 3.0 3.0
│   └── Group: /weather/temperature
│           Dimensions:          (time: 2, station: 6)
│           Data variables:
│               air_temperature  (time, station) float64 96B 4.0 4.0 4.0 4.0 ... 4.0 4.0 4.0
│               dewpoint         (time, station) float64 96B 5.0 5.0 5.0 5.0 ... 5.0 5.0 5.0
└── Group: /satellite
        Dimensions:     (lat: 3, lon: 3, time: 2)
        Coordinates:
          * time        (time) <U7 56B '2022-01' '2023-01'
          * lat         (lat) int64 24B 10 20 30
          * lon         (lon) int64 24B -100 -80 -60
        Data variables:
            infrared    (time, lon, lat) float64 144B 6.0 6.0 6.0 6.0 ... 6.0 6.0 6.0
            true_color  (time, lon, lat) float64 144B 7.0 7.0 7.0 7.0 ... 7.0 7.0 7.0

Which notice has an identical time coordinate displayed in multiple child groups.

But I expected it to look like this:

Out[104]: 
<xarray.DataTree>
Group: /
│   Dimensions:  (time: 2)
│   Coordinates:
│     * time     (time) <U7 56B '2022-01' '2023-01'
├── Group: /weather
│   │   Dimensions:     (station: 6, time: 2)
│   │   Coordinates:
│   │     * station     (station) <U1 24B 'a' 'b' 'c' 'd' 'e' 'f'
│   │   Data variables:
│   │       wind_speed  (time, station) float64 96B 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
│   │       pressure    (time, station) float64 96B 3.0 3.0 3.0 3.0 ... 3.0 3.0 3.0 3.0
│   └── Group: /weather/temperature
│           Dimensions:          (time: 2, station: 6)
│           Data variables:
│               air_temperature  (time, station) float64 96B 4.0 4.0 4.0 4.0 ... 4.0 4.0 4.0
│               dewpoint         (time, station) float64 96B 5.0 5.0 5.0 5.0 ... 5.0 5.0 5.0
└── Group: /satellite
        Dimensions:     (lat: 3, lon: 3, time: 2)
        Coordinates:
          * lat         (lat) int64 24B 10 20 30
          * lon         (lon) int64 24B -100 -80 -60
        Data variables:
            infrared    (time, lon, lat) float64 144B 6.0 6.0 6.0 6.0 ... 6.0 6.0 6.0
            true_color  (time, lon, lat) float64 144B 7.0 7.0 7.0 7.0 ... 7.0 7.0 7.0

which only has time defined in the root group. Based on #9463 I expected the inheritance onto child node to be implicit, and not have the same coordinate explicitly displayed twice.

I also don't understand why the current repr does not have time displayed in the /weather/temperature node.

It's possible that this is caused by an interaction between #9475 and .from_dict (I haven't looked into this deeply yet), but I wanted to make a note that we should come back to check this definitely works once #9475 is resolved.

cc @shoyer @flamingbear

shoyer commented 1 month ago

Indeed, something is definitely going wrong in the from_dict constructor.

Looks at what dt['weather'] looks like -- it's even worse, with a duplicate "time" coordinate!

In [7]: dt2['weather']
Out[7]:
<xarray.DataTree 'weather'>
Group: /weather
│   Dimensions:     (time: 2, station: 6)
│   Coordinates:
│     * time        (time) <U7 56B '2022-01' '2023-01'
│     * station     (station) <U1 24B 'a' 'b' 'c' 'd' 'e' 'f'
│   Inherited coordinates:
│     * time        (time) <U7 56B '2022-01' '2023-01'
│   Data variables:
│       wind_speed  (time, station) float64 96B 2.0 2.0 2.0 2.0 ... 2.0 2.0 2.0 2.0
│       pressure    (time, station) float64 96B 3.0 3.0 3.0 3.0 ... 3.0 3.0 3.0 3.0
└── Group: /weather/temperature
        Dimensions:          (time: 2, station: 6)
        Data variables:
            air_temperature  (time, station) float64 96B 4.0 4.0 4.0 4.0 ... 4.0 4.0 4.0
            dewpoint         (time, station) float64 96B 5.0 5.0 5.0 5.0 ... 5.0 5.0 5.0