pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.55k stars 1.06k forks source link

API for multi-dimensional resampling/regridding #486

Open shoyer opened 9 years ago

shoyer commented 9 years ago

This notebook by @kegl shows a nice example of how to use pyresample with xray: https://www.lri.fr/~kegl/Ramps/edaElNino.html#Downsampling

It would nice to build a wrapper for this machinery directly into xray in some way.

xref #475

cc @jhamman @rabernat

rabernat commented 9 years ago

Pyresample is probably overkill for that case. Aggregating / decimating regular lat-lon grids could probably be done much more simply. For example

N = 10
fac = 2
x = np.arange(N, dtype=np.float64)
np.add.reduceat(x, np.arange(0,N,fac)) / fac

This gives array([ 0.5, 2.5, 4.5, 6.5, 8.5])

This type of resampling has the advantage of preserving certain integral invariants, as opposed to the nearest neighbor resampling in the example above. (Imagine if there had been lots of spatial variance below the 5 degree scale in that example--it would have been aliased horribly. That was only avoided because the original field was very smooth.) It is also very fast.

Pyresample seems best for complicated transformations from one map projection to another. I'm not sure I fully understand how xray handles grids where the coordinates are themselves 2d fields, as in this example.

shoyer commented 9 years ago

Indeed, SciPy's ndimage.interpolation.zoom would probably be more appropriate (and faster) for regular grids.

Xray currently doesn't have any built-in support for handling projected data, but basic selection and regridding (from explicit arrays of 2D coordinates) seems in scope for the project.

jhamman commented 9 years ago

I'd be interested in helping build a wrapper / api that supports the following types of regridding operations. I don't think pyresample handles all of these:

  1. Bilinear interpolation
  2. Bicubic interpolation
  3. Distance-weighted average remapping
  4. Nearest neighbor remapping
  5. Conservative (area) remapping
  6. Largest area fraction remapping

I would also like to see some better support for 2D coordinate variables. These are specifically outlined in the cf-conventions in section: 5.2. Two-Dimensional Latitude, Longitude, Coordinate Variables. Maybe we can open an additional issue for that...

forman commented 8 years ago

@jhamman: any progress on this? Our team would be happy to contribute as we have similar requirements in our project.

rabernat commented 8 years ago

@forman I am starting to suspect that this might be possible to implement through #818

jhamman commented 8 years ago

@forman - no progress on my end and no plans to work on this in the next 6 months. If your team is interested in working on this, we can discuss the api further and I'm happy to provide input and guidance along the way. One option is to wrap pyresample and possibly fill in some of the missing pieces in their api. I'd like to see a user interface that looks something like this:

da.resample_like(other, kind='bilinear', dims=('lat', 'lon'))

As @rabernat mentions, for some remapping/resampling, the multi-dimensional groupby will help with some of these applications.

rabernat commented 8 years ago

I feel like the biggest application of the multi-dimensional groupby will be with "conservative resampling" and coarse-graining, where you want to make sure to conserve certain integrals (e.g. total heat content) while changing coordinates.

pyresample will be more useful for fine-graining and interpolating missing data.

godfrey4000 commented 7 years ago

I have an immediate need in this area. My objective is to create a tool that will enable arithmetic on variables defined on lattices whose points don't coincide. Through my attempts thus far, it has become clear that I need data structures that incorporate spacial indexing and lattice indexing.

Since I have to tackle this issue to proceed, I thought I should follow the thinking discussed in this forum, so that it may be useful to others.

rabernat commented 7 years ago

@godfrey4000: lots of us in the climate community would like xarray-backed regridding. It is a hard problem, however. It wold be great if you wanted to work on it.

At a recent workshop, a group of xarray users developed a draft design document for a regridding package. https://aospy.hackpad.com/Regridding-Design-Document-tENJARIeg83

Your comments on this would be very welcome. An open question is whether the regridding belongs within xarray or in a standalone package (which would of course have xarray as a dependency).

jhamman commented 7 years ago

@rabernat - I've more or less settled that this belongs in a separate package that uses xarray's accessor features.

I've setup a little project (xmap) to get this started that I hope we can garner some volunteers to help push forward (nudge @godfrey4000 and @forman): https://github.com/jhamman/xmap

There will likely be some overlap between features that xmap and xarray. In those cases, we can try to push relevant features toward xarray.

godfrey4000 commented 7 years ago

I'm ready to start working on this project. I already have a prototype regridding class that I developed as part of another project. Working on that, I discovered these points:

Some of these choices are:

Is there a style guide that I can/should follow? Something like this: https://google.github.io/styleguide/pyguide.html? Does it or something else define naming conventions?

shoyer commented 7 years ago

Is there a style guide that I can/should follow? Something like this: https://google.github.io/styleguide/pyguide.html? Does it or something else define naming conventions?

It's pretty standard to follow Python's PEP8 with NumPy-style docstrings.

I generally like the Google style guide, too, but it leans towards being overly strict -- sometimes it's OK to be more relaxed (e.g., it's rules for valid import statements). Also, the public facing version has gotten out of date (I think there are plans to update it).

duncanwp commented 7 years ago

@jhamman @godfrey4000 - I'm not sure of the status of this, but I'm the lead developer on a package called CIS which might be useful/relevant.

It was designed as a command line tool to allow easy collocation (resampling) between different model and observation datasets, but is now also a Python library. We spent a fair amount of time thinking about the various permutations and you can see some of the details in our paper here. Internally we currently use Iris Cube-like objects but it would be pretty easy to operate on xarray Datasets since they share a similar design.

The basic syntax is:

from CIS import read_data
X = read_data('some_obs_data.nc')
Y = read_data('some_other_data.nc')
X.sampled_from(Y)
# or...
Y.collocated_onto(X)

Happy to discuss further here, or in xmap.

PeterDSteinberg commented 7 years ago

Regridding is of interest to NASA and other clients of ours. It is important to them to be able to do broadcast operations between rasters that differ in resolution or are otherwise offset. We'll follow the XMap repo mentioned above ( @jhamman ) and see about building on that style. Our clients and open source tools like datashader for viz and elm for ML could use XMap and benefit from coordinate transformations and regridding. We have a meeting internally to discuss approaches to the coordinates' metadata and resampling / regridding and I'll be in touch further soon about how we can help here (see also the issues on this experimental earthio repo).

forman commented 7 years ago

@PeterDSteinberg please have a look at module gridtools.resampling of repo https://github.com/CAB-LAB/gridtools. There are various up- and downsampling methods, which can deal with NaNs, and which are fast as C thanks to JIT through Numba. We use this package successfully in two projects.

JiaweiZhuang commented 7 years ago

I've wrapped ESMF/ESMPy by xarray: https://github.com/JiaweiZhuang/xESMF

It supports remapping between arbitrary quadrilateral grids, using ESMF's regridding algorithms including bilinear, conservative, nearest neighbour, etc... See this notebook for an example.

The package is still preliminary but it already works. See "Issues & Plans" in the main page for more details.

rabernat commented 7 years ago

Awesome work @JiaweiZhuang! This could be a great way forward for this important need. I think lots of us would be keen to contribute to your project. I encourage you to add tests and docs...that will help other contributors feel comfortable getting involved.

kegl commented 7 years ago

Super cool, thanks!

JiaweiZhuang commented 7 years ago

@rabernat Thanks for the suggestion! I'll add tests&docs when time allows.

If you want to look into details: The package contains the two layers (explained in the "Design Idea" section). The first layer has nothing to do with xarray, but just provides a convenient way (only with numpy) to access a useful subset of ESMPy functions. This layer is important because ESMPy's API is too complicated, but once it is done it doesn't need to be changed too often. The second layer wraps the first layer using xarray. Most of the crafts will be added to the second layer.

As a temporary workaround, I've added another notebook for using the low-level wrapper, for interested developers.

darothen commented 7 years ago

If ESMF is the way to go, then some effort needs to be made to build conda recipes and other infrastructure for distributing and building the platform. It's a heavy dependency to haul around.

ocefpaf commented 7 years ago

then some effort needs to be made to build conda recipes and other infrastructure for distributing and building the platform.

Like https://github.com/conda-forge/esmf-feedstock :wink:

(Windows is still a problem b/c of the Fortran compiler.)

darothen commented 7 years ago

@ocefpaf Awesome, good to know that hurdle has already been leaped :)

JiaweiZhuang commented 7 years ago

@ocefpaf Any plan for Python3-compatible ESMPy? I only see Python2.7 here: https://github.com/conda-forge/esmpy-feedstock

ocefpaf commented 7 years ago

@JiaweiZhuang let's discuss that in the feedstock issue tracker to avoid cluttering xarray's.

JiaweiZhuang commented 6 years ago

I am thinking about the API design for xESMF (JiaweiZhuang/xESMF#9). Any comments are welcome 😃

shoyer commented 6 years ago

@mraspaud of pyresample expressed interest to me offline about bringing some of pyresample's resampling features into xarray -- welcome to the conversation!

mraspaud commented 6 years ago

thanks @shoyer

jhamman commented 6 years ago

@mraspaud - What functionality are you interested in bringing over? I've been watching pyresample for a while and would love to see our two packages leverage each other's functionality (where possible).

mraspaud commented 6 years ago

@jhamman One possibility would be to have a .resample on a DataArray (or equivalent independent function) that would be provided also a set of new coordinates, and that would return a new DataArray resampled to the new coordinates. One step further would be to implement this in sel or isel directly somehow.

shoyer commented 6 years ago

For nearest-neighbor style resampling, we already have support for 1-dimensional resampling in .reindex()/.sel(). It would feel pretty natural to add support for N-dimensional resampling, too, if those lookups can use an index of some sort (i.e., a KDTree) to do things efficiently.

mraspaud commented 6 years ago

@shoyer absolutely, I will look into it, soon I hope

stale[bot] commented 4 years ago

In order to maintain a list of currently relevant issues, we mark issues as stale after a period of inactivity

If this issue remains relevant, please comment here or remove the stale label; otherwise it will be marked as closed automatically