pangeo-data / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
182 stars 32 forks source link

"input_dims" #362

Open agmeyer4 opened 1 month ago

agmeyer4 commented 1 month ago

When attempting to run

regridder = xe.Regridder(grid_in,grid_out,method='conservative',input_dims=('south_north','west_east'))

Where grid_in and grid_out are dictionaries with keys = ['lat','lon','lat_b','lon_b'], I get an error:

TypeError: xesmf.frontend.BaseRegridder.__init__() got multiple values for keyword argument 'input_dims'

Without declaring the input_dims, I always get the warning about xesmf automatically choosing the dimensions to use.

It seems that in frontend.py at line 928, the BaseRegridder is instantiated with both input_dims=input_dims as well as **kwargs, of which input_dims is a key. If I add the following above the super().init() call in Regridder.init(), it works as expected:

if 'input_dims' in list(kwargs.keys()):
     input_dims = kwargs['input_dims']
     kwargs.pop('input_dims')

However, I do not know how this would affect the parameters set in lines 916-920. Is there an easier/safer/better fix?

aulemahal commented 1 month ago

Hi @agmeyer4,

If I recall correctly, I wrote this but didn't expect users to pass input_dims to Regridder. But it's a bug that I didn't cover the case. The idea that you would pass different names for the lon and lat coordinates in the init versus the call itself is not something I thought of, but I see no reason for it not to work if the user wants to do so.

Your solution is good, and should be done for output_dims too in that case. PRs are welcome, we can guide you through the steps if you need help. Otherwise, we'll try to fit this in before the next release, according to the personal time available from the maintainers.

agmeyer4 commented 1 month ago

@aulemahal I'm a bit confused. Are you suggesting that I should instead be calling something like this?

regridder = xe.Regridder(grid_in,grid_out,method='conservative')
regridded_ds = regridder(ds,input_dims=('south_north','west_east'))

When I run the code above, I receive another error:

TypeError: BaseRegridder.__call__() got an unexpected keyword argument 'input_dims'

The docs for the regridder() call inputs say

indata : numpy array, dask array, xarray DataArray or Dataset. If not an xarray object or if input_dìms was not given in the init, the rightmost two dimensions must be the same as ds_in. Can have arbitrary additional dimensions.

This leads me to believe I should include input_dims in my xe.Regridder() instantiation, hence the need for my bug fix. Am I missing something here?

aulemahal commented 1 month ago

Sorry, that was unclear. By "unexpected", I was talking about having a grid definition that doesn't match the regridding dataset. An "expected" use case would have grid_in be a xarray Dataset that looks very similar to the ds sent to regridder(ds), in most case grid_in and ds are the same thing.

But that is when we assume the spatial coordinates will have the common lat, lon names or have the proper attributes so that cf-xarray detects them as the latitude and longitude. In your case, the data looks like it does not follow such standards, which justifies your usage of the input_dims explicit argument. However, the current Regridding.__init__ does not consider that case.

The doc for regridder() comes from BaseRegridder which is why it mentions input_dims, an argument of BaseRegridder.__init__.

agmeyer4 commented 1 month ago

Got it, that does make sense. The patch I've used seems to work in my case. I'm new to pull requests, so guidance on how to do that or your inclusion of the fix in your next version would be great.