pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
183 stars 32 forks source link

Applying regridder on dataset drops coordinate attributes #286

Closed yt87 closed 10 months ago

yt87 commented 11 months ago

I am not sure it is a bug or an undocumented feature. Consider the following code:

import numpy as np
import xarray as xr
import xesmf as xe

ds = xr.tutorial.open_dataset(
    "air_temperature"
)
ds['lat'].attrs
ds_out = xr.Dataset(
    {
        "lat": (["lat"], np.arange(16, 75, 1.0), {"units": "degrees_north"}),
        "lon": (["lon"], np.arange(200, 330, 1.5), {"units": "degrees_east"}),
    }
)
regridder = xe.Regridder(ds, ds_out, "conservative")
ds_regridded = regridder(ds)
ds['lat'].attrs
ds_regridded['lat'].attrs

The output is:

{'standard_name': 'latitude',
 'long_name': 'Latitude',
 'units': 'degrees_north',
 'axis': 'Y'}

{}

{'units': 'degrees_north'}

Actually, all coordinate attributes are gone. Attributes for data variables, in this case air, are preserved. Interestingly, regridder assigns attributes to lat and lon for the output (but not to time), copied from ds_out

yt87 commented 11 months ago

A quick workaround is to pass ds.copy() to regridder

aulemahal commented 11 months ago

@huard This looks like the same issue you were raising on slack!

The copy of attributes from ds_out is a feature indeed. As xESMF only handles the spatial dimensions, the regridding can only assign attributes to lat and lon in your case.

But the loss of attributes on ds is a problem. We need to investigate.

yt87 commented 11 months ago

I think it would make sense to preserve the non-spatial coordinate attributes, such as time in the above example. Regridding does change their meaning.

aulemahal commented 11 months ago

The issue I see hee, is that the regridding does not know which coordinates of ds_out will appear on the regridded output, except for the spatial dims. Also, in you specific case, time was simply absent from ds_out. I think it makes more sense to preserve the attributes from the input (ds) exactly, instead of trying to guess how to "update" them following the regridding.

But if you have an idea on the logic behind this hypothetical update, this could be implemented!

yt87 commented 11 months ago

Sorry, I missed the keep_attrs argument to BaseRegridder.__call__. Interestingly, if set to True, it also preserves attributes in the argument, that is, my problem disappears. I still think there is a bug, the input dataset should not be modified in either case.

yt87 commented 11 months ago

This seems to be xarray issue:

import xarray as xr

def magnitude(a, b):
    func = lambda x, y: np.sqrt(x**2 + y**2)
    return xr.apply_ufunc(func, a, b, keep_attrs=False)

array = xr.DataArray([1, 2, 3], coords=[("x", [0.1, 0.2, 0.3], {'axis': 'X'})])
array
magnitude(array, -array)
array['x'].attrs

The last line prints an empty dictionary. The documentation states https://github.com/pydata/xarray/blob/main/xarray/core/computation.py#L836-L1210

keep_attrs : {"drop", "identical", "no_conflicts", "drop_conflicts", "override"} or bool, optional
   - 'drop' or False: empty attrs on returned xarray object.

Returned, not passed, so likely this is not an intentional behaviour.

charlesgauthier-udm commented 11 months ago

I opened an issue on xarray regarding this bug.

huard commented 10 months ago

Should we close this here, then ?

yt87 commented 10 months ago

I have nothing against closing it here.