pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.61k stars 1.08k forks source link

boundary conditions for differentiate() #6493

Open miniufo opened 2 years ago

miniufo commented 2 years ago

Is your feature request related to a problem?

I need to take centered finite difference of data of length N along the dimension 'X', with boundary conditions (BCs) specified in flexible ways. Before this, we need to pad data with BCs (length becoming N+2) so that the indicing will not be out-of-range.

Commonly used BCs are:

  1. fixed - fill with fixed values so derivatives at BCs are (BC - data[-1])/dx and (data[0] - BC)/dx;
  2. extend - fill BCs with second outer-most values so that derivatives at BCs are exactly zero;
  3. periodic - fill BCs cyclic so that the derivatives are also cyclic.

Describe the solution you'd like

The implementation of differentiate('X') would be like:

# padded with BCs into N+2
data_pad = pad_BCs(data, type='periodic')

# it is safe to take finite difference
for i in range(len(data))
    diff[i] = data_pad [i+1] - data_pad [i-1]

The pad_BCs function could be easily implemented with np.pad() function.

Then we can call:

data.differentiate('X', BCs='periodic')

We may also specify different kind of BCs at the two boundaries:

data.differentiate('X', BCs=['extend', 'fixed'], fill_values=0)

Describe alternatives you've considered

No response

Additional context

I am not clear how differentiate() is implemented and just want to know if this can be implemented in a straightforward way.

Illviljan commented 2 years ago

differentiate relies on np.gradient which appears to not support this, https://numpy.org/doc/stable/reference/generated/numpy.gradient.html.

https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/core/dataset.py#L6606-L6611

https://github.com/pydata/xarray/blob/586992e8d2998751cb97b1cab4d3caa9dca116e0/xarray/core/duck_array_ops.py#L134-L137

TomNicholas commented 2 years ago

@miniufo have you seen xGCM? Your problem might be more easily solved using that library instead.

dcherian commented 2 years ago

+1 for xgcm.

On the xarray side, I think we should just recommend composing pad with diff or differentiate. We'll need to add a "extrapolate" option for the padded coordinate variables for this to work.

miniufo commented 2 years ago

Thanks to you guys here @Illviljan @TomNicholas @dcherian. I've been a user of xgcm for quite a time. So you can see my proposal just follows the style of xgcm.

I am working on my xinvert package, in which I may need some partial differential calculations. This can be done by xgcm quite well, but I am still worried about the metrics concept introduced by xgcm. I think this should be discussed over xgcm's repo.

For most of the cases, lat/lon-type grids are uniform and on the Arakawa A grid. So xarray's differentiate() is good enough with pad() (although it is experimental) for BCs, as suggested by @dcherian. We don't need stagged grid point and metrics, as in xgcm, but centered difference (a[i+1]-a[i-1]) will be good enough for A grid. This is simpler and do not make heavy dependence of the third-party package like xgcm.

I'll give a try with differentiate() and pad() to implement grad/div/vor... But some designs in xgcm also inspire me to make things much natural.

TomNicholas commented 2 years ago

I am still worried about the metrics concept introduced by xgcm. I think this should be discussed over xgcm's repo.

Please do raise an issue there!

We don't need stagged grid point and metrics, as in xgcm, but centered difference (a[i+1]-a[i-1]) will be good enough for A grid.

The new "grid ufuncs" functionality in xGCM (hopefully being released this week) allows you to write grid-aware functions that can do centered operations exactly like this. e.g. see here

https://github.com/xgcm/xgcm/blob/cb7feccb9e4331e71da6f034cbee13f43c3de76d/xgcm/test/test_grid_ufunc.py#L614

This is simpler and do not make heavy dependence of the third-party package like xgcm.

I do sympathise with this though.

But some designs in xgcm also inspire me to make things much natural.

If you tell us exactly what it is you're trying to do then there might be a neat solution.

cc @jbusecke

TomNicholas commented 2 years ago

On the xarray side, I think we should just recommend composing pad with diff or differentiate.

This is essentially just what we're doing in xGCM.

We'll need to add a "extrapolate" option for the padded coordinate variables for this to work.

We also ran into this, and would use it if there were an extrapolate option for xarray's pad.

jbusecke commented 2 years ago

Hi @miniufo et al., just my two cents:

This is simpler and do not make heavy dependence of the third-party package like xgcm.

That is a fair point, but I think there is a counterpoint to be made, that xgcm gives you some more functionality (especially with the new grid_ufuncs feature) with regard to array padding. As you note, this is not needed for your particular setup, but if you use xgcm, you would get the same functionality + at a later point you might get padding on complex grid topologies for free down the line. So in the end this seems like a tradeoff between adding more dependencies vs flexibility and generalizability in the future.

I'll give a try with differentiate() and pad() to implement grad/div/vor... But some designs in xgcm also inspire me to make things much natural.

This makes me think that you really want xgcm, because these properties will naturally be located on staggered grid positions, even if your data is originally on a A grid. And once you start to try to handle these cases it would appear to me that you duplicate some of the functionality of xgcm?

I am still worried about the metrics concept introduced by xgcm. I think this should be discussed over xgcm's repo.

I second others here and think it would be great to elaborate on this on the xgcm issue tracker. But I also want to point out, that using the metrics functionality is entirely optional in xgcm, so if you desire, you can roll your own logic on top of grid.diff/interp etc.

miniufo commented 2 years ago

Oh, I see the release of xgcm of 0.7.0. It is really a great update! I also find the boundary condition and grid_ufunc examples on the docs (still 0.6.0), which indeed may solve many of my problems. The grid-ufunc provides flexible building blocks for complicated cases. I'll spend some times trying the new version, re-think my cases in this great architecture, and report soon if I have problems with that. Thanks to you guys' great work!

A quite question is that has the xgcm been refactored using grid_ufunc? (I hope I could catch up with you guys).

jbusecke commented 2 years ago

yes all of the grid methods (grid.diff etc) are now internally using grid_ufuncs. The axis methods are still going through the old code path, but will be deprecated soon! Please let us know how you get along with the new functionality, we are very curious for user feedback!