zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
116 stars 10 forks source link

Exploring the Foundations and Goals of the GeoZarr Format #3

Closed christophenoel closed 1 year ago

christophenoel commented 1 year ago

Here are a series of ideas to initiate a discussion on the foundations of the GeoZarr format and to exchange on what its goals should be.

GeoZarr was based on three fundamental principles:

  1. Provide cloud-native (optimised) access (i.e. HTTP API which does not required an intermediate service)
  2. Support multidimensional data (hyperspectral, altitude, etc.)
  3. Provide valuable geospatial data description (not restricted to 2D !)

In my opinion, this implies certain assumptions:

Regards,

Christophe

benbovy commented 1 year ago

Great to see discussions and ideas about a GeoZarr format happening here!

I landed here from this thread https://twitter.com/EvenRouault/status/1614053240508936192 about the CRS (grid mapping vs. WKT/PROJJSON), and I was wondering what is the scope of GeoZarr: is is specific to gridded (raster) geospatial data or does it aim at covering all kinds of geospatial datacubes?

Besides the data types mentioned in the current draft, another one is vector datacubes, although applications are rather limited compared to gridded datasets and I'm not sure at all what would be the best format to store vector datacubes (use arrow/parquet - https://github.com/geoarrow/geoarrow - with flattened data? create a zarr codec for geometry coordinates?).

More context on vector datacubes:

cc @edzer @martinfleis

christophenoel commented 1 year ago

I was wondering what is the scope of GeoZarr: is is specific to gridded (raster) geospatial data or does it aim at covering all kinds of geospatial datacubes?

Hi @benbovy ! All doors are open, but the underlying Zarr is limited to multidimensional arrays. So all kind of data that might be provided as a n-D array.

edzer commented 1 year ago

So all kind of data that might be provided as a n-D array.

I guess you mean "as a collection of n-D arrays"?

Sectrion 7.5 of the CF conventions points out vector geometries (points, lines, polygons) can be associated with a dimension of a data cube.

christophenoel commented 1 year ago

I guess a collection of n-D arrays is a n+1-D array. :)

christophenoel commented 1 year ago

One of the key objective GeoZarr is to provide a standard format for all kind of EO multi-dimensional data. This requires to define convention for at least the following aspects:

dblodgett-usgs commented 1 year ago

@christophenoel can you expand on:

NetCDF already has its NCZarr project which did not meet our concerns. GeoZarr reuses the CF conventions to describe the data, but does not pursue the same goals (and aims to be certainly simpler).

Are there specific NCZarr details you can point to? We were discussing NCZarr on the call just now and wanted to understand better how the current GeoZarr spec relates.