xarray-contrib / cf-xarray

an accessor for xarray objects that interprets CF attributes
https://cf-xarray.readthedocs.io/
Apache License 2.0
152 stars 39 forks source link

CF accessor adds grid mapping as coordinate #513

Open juseg opened 1 month ago

juseg commented 1 month ago

Possibly a duplicate of #357 or a discussion topic. Opinions welcomed.

Description

Since cf_xarray == 0.8.0 (#391), selecting variables by standard name with ds.cf[standard_name] results in a different set of coordinates than selecting them by their short name with ds[short_name]. This prevents re-injecting these variables in a compatible (and even the original) dataset using Dataset.assign() (and possibly other xarray merge operations). Instead trying to Dataset.assign() a cf-picked, georeferenced variable into a raises a MergeError.

Could you help me to understand if this is a design choice or a bug? In my case it comes as a breaking change and I am not exactly sure how to go about (as a cf-xarray fix or downstream in https://github.com/juseg/hyoga/issues/73). Happy to contribute as I can.

Miminal example

import xarray as xr
import cf_xarray

# make grid-mapped fake data (or use xr.open_dataset)
ds = xr.Dataset({
    'tas': ((), None, {
        'grid_mapping': 'crs',
        'standard_name': 'air_temperature'}),
    'crs': ((), None, {
        'grid_mapping_name': 'latitude_longitude'})})

# ds and ds.cf variables have different coordinates
print('crs' in ds['tas'].coords)  # False
print('crs' in ds.cf['air_temperature'].coords)  # True

# assign a cf-picked variable (only works on cf_xarray < 0.8.0)
ds.assign(new=ds.cf['air_temperature'])  # xarray.core.merge.MergeError

Error message

Traceback (most recent call last):
  File "/home/julien/git/code/hyoga/bugs/cfxr08.py", line 17, in <module>
    ds.assign(new=ds.cf['air_temperature'])  # xarray.core.merge.MergeError
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/xarray/core/dataset.py", line 6080, in assign
    data.update(results)
  File "/usr/lib/python3.12/site-packages/xarray/core/dataset.py", line 4946, in update
    merge_result = dataset_update_method(self, other)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/xarray/core/merge.py", line 1104, in dataset_update_method
    return merge_core(
           ^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/xarray/core/merge.py", line 772, in merge_core
    raise MergeError(
xarray.core.merge.MergeError: unable to determine if these variables should be coordinates or not in the merged result: {'crs'}
dcherian commented 1 month ago

This is a design choice :) .

Many users complain that xarray's default behaviour isn't what cf-xarray currently does.

Some solutions:

  1. Extract the variable name manually: ds[*ds.cf.standard_names['air_temperature']]
  2. Use the underlying variable object: ds.assign(a=ds.cf['air_temperature'].variable)

xarray.core.merge.MergeError: unable to determine if these variables should be coordinates or not in the merged result: {'crs'}

I also think this should just set crs as a coord. Not sure if there is an xarray issue for this.

juseg commented 1 month ago

This is a design choice :) .

Thank you for the clarification and code samples! :-) I will settle on 2. I think 1 needs double brackets:

ds[[*ds.cf.standard_names['air_temperature']]]

I also think this should just set crs as a coord. Not sure if there is an xarray issue for this.

I could not find an xarray issue on that (https://github.com/pydata/xarray/issues?q=is%3Aissue+mapping+MergeError). However I am still unconvinced whether grid mappings should be considered coordinates (see https://docs.xarray.dev/en/stable/user-guide/terminology.html#term-Coordinate).

Perhaps that is far-fetched, but would you consider a cf-xarray re-implementation of assign? I think it could assign by standard name, and overcome any xarray incompatibility.

ds.cf.assign(air_temperature=darray)

I do something like that at https://github.com/juseg/hyoga/blob/v0.3.0/hyoga/core/accessor.py#L134-L186.

dcherian commented 1 month ago

could not find an xarray issue on that I am still unconvinced whether grid mappings should be considered coordinates

https://github.com/pydata/xarray/issues/6447 . In the xarray model "coordinates" is the only way to link variables to each other. That's why I think they should be... Either way, I think this merge should succeed, whether the variable is promoted to coordinates is a somewhat arbitrary implementation detail.

Perhaps that is far-fetched, but would you consider a cf-xarray re-implementation of assign?

Not far fetched at all! I'd be happy to merge a PR adding this.

juseg commented 1 month ago

OK I understand your point with the coordinates now. And great, I will be working on an assign PR then!

dcherian commented 1 month ago

Thanks!