pangeo-data / pangeo

Pangeo website + discussion of general issues related to the project.
http://pangeo.io
699 stars 189 forks source link

let's enumerate all the ways to represent a "grid" in python #356

Closed rabernat closed 4 years ago

rabernat commented 6 years ago

As discussed in the Pangeo meeting, there are currently many different models for a grid in python. Some of these are explicit, some are implicit in how packages work. I am talking about both structured and unstructured grids.

Let's try to put together a list of what these are and how they differ. For now, I will just throw out some links:

What else should be on this list?

cc @dopplershift, @bekozi, @rsignell-usgs

JiaweiZhuang commented 5 years ago

"Provide a remap-to interface (da.remap.remap_to([('lat', 0.5), ('lon', 0.75)], how='nearest'))"

So that tells me that you need a object underneath this that "understands" 'lat' and 'lon' directly -- rather than indexes -- and with an abstraction that provides the same interface regardless of grid type.

A quick comment -- whether a special notion of 'lat'/'lon' is necessary depends on the requirement of regridding algorithms. Xarray has a general purpose interp() that works for any dimensions, maybe named 'x', 'y', 'lon', 'eastward' or whatever. It will work on lat/lon dimensions, except that it is not aware of:

Indeed a special treatment on 'lat'/'lon' would help many geoscience problems like above (definitely useful for xESMF). I am not sure if xarray wants that, because it is more like a general-purpose array library and there shouldn't be a special dimension . An extension like geoxarray seems a more proper place.

rabernat commented 5 years ago

The document David linked to is almost three years old. It was a brainstorming exercise we did at the very first Pangeo meeting. I would not use it to guide this discussion.

ChrisBarker-NOAA commented 5 years ago

“A quick comment -- whether a special notion of 'lat'/'lon' is necessary depends on the requirement of regridding algorithms. Xarray has a general purpose interp()”

I was being perhaps too brief — I didn’t mean lat/Lon as opposed to x/y or northing/easting (though that is an issue is would be nice to address somehow).

Rather, I meant some kind of “world” coordinates as distinct from grid coordinates-/ that is there may be an arbitrary relationship between where a point is and what index into an array the values associated with that point belong to.

The location represented by index i may be no where near that of index I+j - the entire UGRID problem.

Xarray (or any software) needs s higher level abstraction for “location”.

shoyer commented 5 years ago

My rough thinking here is that this could be enabled from the xarray if we make the notion of an "index" first-class in the data model, e.g., as described under "Flexible indexes" in the development roadmap @jhamman and I drafted last summer at SciPy: http://xarray.pydata.org/en/stable/roadmap.html#flexible-indexes

Unfortunately I haven't had as much time to work on this as I would like, so we're still in the early stages of this refactor. It would be great if others could help out this (see https://github.com/pydata/xarray/issues/1603).

From the perspective of what this flexible index API could look like, here are the pandas.Index methods we make use of in xarray currently:

There are lots of other features that could logically go on a "grid"/"N-dimensional index" too, e.g., interpolation, resampling, differential operators. It would be great to define common interfaces for all these (and maybe even put those interfaces in xarray).

The only interface which xarray absolutely needs to be useful is alignment, because basically every operation in xarray that involves more than one object calls align() internally.

shoyer commented 5 years ago

I'm afraid I'll miss the call tomorrow during my commute. But I would be happy to weigh in on any discussion / proposals that come out of this.

(My general experience has been that large calls/meetings are not especially effective, but I'm glad there's such robust interest in this topic!)

rabernat commented 5 years ago

This meeting will start in 10 minutes. https://columbiauniversity.zoom.us/j/640510854

Here is a preliminary agenda: https://docs.google.com/document/d/19PTpG0EDZiW-vULQ1EOjvGTbB-S3_0D5wI2INyFroB4/edit?usp=sharing

rabernat commented 5 years ago

@ChrisBarker-NOAA - are you planning to join us? We would love to have you in the discussion.

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

rabernat commented 5 years ago

Definitely not resolved StateBot!

Meteodan commented 5 years ago

Hi all,

I've been reading this thread with great interest, since I am currently in the process of refactoring some old kludgy code from my graduate school days that handles analysis and plotting of high-resolution cloud-resolving model output. The CRMs I work with all have the common thread that they are based on a staggered grid in the horizontal and vertical (specifically an Arakawa C-grid). But many of them are designed to run in "idealized" mode where the grid is not typically georeferenced, by which I mean it does not specify the horizontal coordinates in terms of lat/lon, but rather a more generic physical rectilinear grid. I personally run these and other models both in "idealized" mode and in the "real-data" geo-referenced mode discussed in this thread, and I know this is very common, especially in the high-resolution storm modeling community. So first, I just wanted to bring up this particular use case to add to the very good discussion above, since I think it was mentioned only in passing.

Second, my code does handle locating and plotting the various variables on their native grid, whether it be the cell centers or faces, but it does so in a rather application-specific and kludgy way. I plan on putting it up on GitHub as soon as I can get through this current refactoring session, which is long overdue. Hopefully that won't take too terribly long. More to the point though, when I started the refactoring, I thought, "Surely someone has written a library somewhere that handles all the grid-related manipulations associated with staggered grids already, so I may not have to waste as much time cleaning my own implementation up". That's when I stumbled on @ChrisBarker-NOAA 's gridded package and later this thread. In my admittedly cursory look so far, it's not clear to me whether the "gridded" package can handle the aforementioned case where the grid is not represented in lat/lon coordinates. I could generate "fake" lat/lon coordinates using, say, a flat earth approximation, etc., but I was hoping there would be a more "native" way.

Any insight and discussions would be greatly appreciated, and I'm happy to continue this sort of discussion and help out in the larger efforts being discussed here as much as I can.

ChrisBarker-NOAA commented 5 years ago

@Meteodan:

Gridded does currently assume lay-Lon coords. But other than the naming, it is actually treating them as orthogonal coordinates. That is, not handling projections or wrapping around the earth, etc.

So it would be easy to adapt to other coordinates. In fact, I’d like to do that anyway to support projected coordinates anyway.

Meteodan commented 5 years ago

@ChrisBarker-NOAA

Thanks for the reply! So if I'm understanding correctly, I could just pass in my coordinate arrays for the node_lat, node_lon, etc. keyword arguments and it'll just work?

Perhaps this isn't the best place for this, but is there a way to use gridded to compute "corner" points of a staggered grid? I need this for using matplotlib pcolor. Would it be as simple as just passing in the appropriate coordinate arrays for the edges/centers when constructing a Variable object and selecting them appropriately in the call to pcolor?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 5 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

rabernat commented 4 years ago

I just learned about some new packages for geo-aware grid interpolation.

yosoyjay commented 4 years ago

Summarizing some offline discussion with @rabernat, @kwilcox, @hetland and @ocefpaf on efficient and accurate approaches to interpolation and the requisite identification of a point-in-a-cell in unstructured triangular, curvilinear quads, and mixed types (tris + quads):

cc @rsignell-usgs

ChrisBarker-NOAA commented 4 years ago

and suggests using a Cell Tree method (Garth and Joy, 2010) as a solution.

Implementation in 2D here:

https://github.com/NOAA-ORR-ERD/cell_tree2d

(and on pip and conda-forge)

Not all that well documented, but ask if you want to figure out how to use it.

We are using it operationally for triangular and quad grids, via gridded.

The tree itself is built on bounding boxes, so you use any shape cell, as long as you provide a way to do a final "point in cell" check.

The paper does talk about it being amenable to multiprocessing, but it was not obvious to us how to do that, and we didn't try -- GPU or conventional.

ChrisBarker-NOAA commented 4 years ago

@Meteodan:

in reply to the above -- I think so -- but post a issue on the gridded repo to discuss.

https://github.com/NOAA-ORR-ERD/gridded

-CHB

@ChrisBarker-NOAA

Thanks for the reply! So if I'm understanding correctly, I could just pass in my coordinate arrays for the node_lat, node_lon, etc. keyword arguments and it'll just work?

Perhaps this isn't the best place for this, but is there a way to use gridded to compute "corner" points of a staggered grid? I need this for using matplotlib pcolor. Would it be as simple as just passing in the appropriate coordinate arrays for the edges/centers when constructing a Variable object and selecting them appropriately in the call to pcolor?

hetland commented 4 years ago

I wrote some thoughts on interpolating from a quad grid to random points over on the discourse site.

Many of these topics are already covered here and elsewhere, but I wanted to gather these ideas and put them in one place. Also, since it's just a discussion of algorithms, not specific code or packages, discourse seemed more appropriate. I'd be happy to move the discussion here if that's more appropriate.

botzill commented 4 years ago

Hi.

Sorry about this stupid question, I'm new to this but, is there any of the tool enumerated that can help me to divide the entire world map into regions/rectangles/circles with radius R? So that I can play with the R and give me different number of regions. In the end I would like to get the coordinates(lat, long) of this regions or the coordinates of the rectangles(top/left x bottom/right) as well as lat, long. Any help into a right direction will be really appreciated.

Thx!

ChrisBarker-NOAA commented 4 years ago

On Mon, Dec 2, 2019 at 10:20 PM Chirica Gheorghe notifications@github.com wrote:

Sorry about this stupid question, I'm new to this but, is there any of the tool enumerated that can help me to divide the entire world map into regions/rectangles/circles with radius R?

well, the world is an ellipsoid, so you can't really divide it up into rectangles, or circles ..

Triangles perhaps.

Or you could choose a projection, and then rectangles would be doable. But they would suffer from whatever distortions that projection provides.

What is your use case here?

-CHB

--

Christopher Barker, Ph.D. Oceanographer

Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception

Chris.Barker@noaa.gov

botzill commented 4 years ago

Thx @ChrisBarker-NOAA.

I know that will not be perfect but this is not an issue, we can adjust some overlapping, I'm not looking to be perfect, an approximation works as well.

Thx.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 4 years ago

This issue has been automatically closed because it had not seen recent activity. The issue can always be reopened at a later date.

bluetyson commented 3 years ago

Interesting discussion, thanks!

bluetyson commented 3 years ago

Came across this looking for anyone that might have done things on grid merging dealing with edge effects

hetland commented 3 years ago

I'm not sure exactly what you mean. Can you provide an example?

RichardScottOZ commented 3 years ago

Things like Grid Merge in Intrepid - but open sourced version, not mathematically rolling your own.