zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

Are vector geometries in scope? #28

Open martinfleis opened 9 months ago

martinfleis commented 9 months ago

Hi,

In https://github.com/zarr-developers/geozarr-spec/issues/3#issuecomment-1400032128 @benbovy briefly mentioned vector data cubes and our interest in finding a way of serializing them in Zarr.

Vector data cubes are n-d arrays where at least one dimension is indexed by the array of vector geometries (see https://xvec.readthedocs.io/en/stable/intro.html). Since the data in the cube are n-d arrays, we would like to have a way of storing such objects in Zarr. There is a CF way of doing that, but it is a bit limited (e.g. only one dimension can be indexed by geometries). The other, related use case, is when vector geometries are variables rather than coordinates, for which there is no support anywhere as far as I can tell. Though I suppose the same encoding could be applied with some additional steps (like flattening of the array).

We would like to roll out some prototype soon, so I wanted to check if there is an interest from you to stay in sync and eventually include support of geometries in the GeoZarr spec or if it is out of scope. There are certainly some differences (e.g. the CRS may be treated differently as there is no grid) and it may cause some unnecessary friction within the spec, so I totally understand if that is out of scope. It surely does not follow "recommendations for storing multidimensional georeferenced grid of geospatial observations" as there is no grid.

There is a bit of discussion on how to implement that in https://github.com/xarray-contrib/xvec/issues/48 where I prototyped a way based on the GeoArrow encoding of geometries into a set of arrays and another one that uses well-known binary representation, to give you an idea what the options are (+ the CF encoding linked above).

benbovy commented 9 months ago

It surely does not follow "recommendations for storing multidimensional georeferenced grid of geospatial observations" as there is no grid.

It is in the scope of the "Mixing Data" goal, though :)

IIUC in ZEP4 conventions are composable, so if supporting geometries in the geozarr spec introduces too much friction maybe this could be addressed in another (sub-)convention? This makes me wonder if there is any notion of convention dependency or hierarchy in ZEP4? cc @rabernat