zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

My thoughts on coordinate #48

Open martindurant opened 1 month ago

martindurant commented 1 month ago

Sorry for getting distracted at the end of the geo-zarr meeting we just had (for those that were there). Here is a summary of what I was getting at.

(@rabernat , yes I know this has been discussed many times over - apologies)

There are two principal parts to the coordinates problem:

Coordinate transform

A mechanism within zarr/xarray to find (each of) the coordinates of a given array position and the (fractional) array location of a given coordinate set. This should be a vectorized operation each way.

Currently, xarray supports explicit coordinate value arrays via the netCDF model well (and "flexible" indexes whose internals I don't understand well).

Crucially, I advocate that the transform mechanism is independent of the data domain, so that we don't treat "lon/lat" as special. This is because zarr and xarray are general purpose libraries, and we don't want to exclude microscopy, genetics and other fields with many users.

Coordinate definitions

In the meeting, a few specific (geo) coordinate definitions were mentioned:

plus, of course, netCDF explicit arrays (with or without CF). I also mentioned astro WCS as a reference point (which supports explicit, affine, and various analytic forms for arbitrary dimensionality with no geo reference; interestingly, it also applies to fields of tables).

I would suggest that it is the job of geo-zarr to build the converters to and from these styles of definitions to transform internal representation, such that you can round-trip coordinate information without losing accuracy.

dblodgett-usgs commented 1 month ago

Wish I had space to take part in this work more... sorry to pop into this issue out of the blue, but I can't resist.

I Couldn't agree more @martindurant.

A potential source for inspiration on this is the implementation of rectilinear, curvilinear, and discrete spatio-temporal array axes in the stars R package. @edzer may be able to weigh in / advise. https://r-spatial.github.io/stars/articles/stars4.html is probably a good place to start.

mdsumner commented 1 month ago

I'm also trying to find my feet in this Python heavy space. Shouldn't this be a Zarr topic? Non lonlat geography exists in "geo", and even xarray has recognized the need to move beyond degenerate rectilinear arrays as the most compact referencing model. Zarr itself needs these compact forms as well, it's more about graphics and model arrays than geo-anything. Ensuring and persisting the crs is more the geo part, in general terms the metadata and units of the coordinate system are crucial in any domain, independently of whether a NetCDF style or more general framework is used. I just worry this tent isn't broad enough, but I appreciate the importance (and brilliance) of Zarr. If it can get this smarter referencing for regular or graphics arrays, and not mix up regular-grids-devolved-to-longlat with real curvilinear cases it will truly be a general and future-proof framework.

martindurant commented 4 weeks ago

Shouldn't this be a Zarr topic?

Yes, certainly it could be copied there; or maybe the coordinate interpreting discussion belongs in xarray? Maybe zarr simply presents the attributes defining coordinates mapping to other libraries, but personally I'd be happy to see the f(x, y, z, ...) and its inverse(s) defined in zarr.

Ensuring and persisting the crs is more the geo part, in general terms the metadata and units of the coordinate system are crucial in any domain

Exactly. I particularly have in mind medical ("device" and "patient" coordinates, normally affiune transforms) and astro (curvilinear celestial coordinates and physical units like wavelength), because of my background.

christophenoel commented 1 week ago

I just want to drop this here: the OGC specification that deals with all types of coverage and their encoding is OGC Coverage Implementation Schema 1.1 : https://docs.ogc.org/is/09-146r6/09-146r6.html#39