ocean-eddy-cpt / gcm-filters

Diffusion-based Spatial Filtering of Gridded Data
https://gcm-filters.readthedocs.io/
Other
37 stars 24 forks source link

Order of filter dimensions matters #125

Closed NoraLoose closed 2 years ago

NoraLoose commented 2 years ago

For most of our filter kernels, it matters whether the user does

filtered = filter.apply(field, dims=['y',  'x'])  # correct filtering

versus

filtered = filter.apply(field, dims=['x',  'y'])  # incorrect filtering

This came up in this issue in the context of filtering on a tripole grid. Our code needs the user to specify the correct order ['y', 'x'] because the tripolar exchanges are coded by assuming that y (latitude) is the second to last dimension:

https://github.com/ocean-eddy-cpt/gcm-filters/blob/75f172c7c74c1f2f3b91e36f7345bd138f01c881/gcm_filters/kernels.py#L32-L39

Similarly, other Laplacians compute wet masks for the southern (and western) cell edges:

https://github.com/ocean-eddy-cpt/gcm-filters/blob/75f172c7c74c1f2f3b91e36f7345bd138f01c881/gcm_filters/kernels.py#L279-L281

This also assumes that latitude is the second to last dimension.

We should do either of the following two things:

  1. Change the code to the effect that it can deal with an incorrect dimension order ['x', 'y'].
  2. Communicate more clearly in the API and documentation that the dimension order matters.
NoraLoose commented 2 years ago

We have to work on this issue to make progress toward publication in JOSS. Any ideas for elegant coding solutions?

rabernat commented 2 years ago

In the short term, I see no way around the user having to write

filtered = filter.apply(field, dims=['y', 'x'])

The reason is that there is know way for gcm-filters to know a-priori whether the order is "wrong". In the code above, the user is saying, the dimensions I want to filter are ['y', 'x']; we use a list (rather than a set) because the order matters. I literally would not know how to try to detect the error if the user passed ['x', 'y'], without making unacceptable assumptions about either the shape of the data (e.g. "x should be the last dimension") or the names of the coordinates (e.g. "x is always longitude"). So for the short term (e.g. the JOSS review), _I propose that we address this with documentation.*

In the longer term, there are two ways we could do better.

TomNicholas commented 2 years ago

(@rabernat you tagged the wrong @TomNicholas!)

Our code needs the user to specify the correct order ['y', 'x'] because the tripolar exchanges are coded by assuming that y (latitude) is the second to last dimension:

We reimplement our laplacians as xgcm grid_ufuncs. In that case, we would not be specifying dims but rather CF axis names to operate on.

I'm not sure I understand why rewriting as a grid ufunc would solve this dimension order issue. If the filter kernel is ultimately a (numpy universal) function which operates over two axes (x and y), then calling xr.apply_ufunc(da, kernel, core_dims=[['x', 'y']]) will move those two axes (now dimensions 'x' and 'y') to the last two axes of the array - it won't change the order of 'x' and 'y'.

rabernat commented 2 years ago

I'm not sure I understand why rewriting as a grid ufunc would solve this dimension order issue.

Because I think the dims argument would be completely unneeded. It would be inferred completely from the input data.

NoraLoose commented 2 years ago

Thanks for your perspectives and ideas, @rabernat and @TomNicholas!

I will try to put in a PR soon for a short-term solution (i.e., more documentation).