zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

Use Tile Matrix Set to describe multiscales #44

Open thomas-maschler opened 4 months ago

thomas-maschler commented 4 months ago

This PR implements the changes discussed in #30 and during the Zarr Sprint on Feb. 8, 2024 (participants: @maxrjones and @thomas-maschler).

It refactors the current multiscales metadata attribute and replaces the current dataset definition with the OGC Two Dimensional Tile Matrix Set standard. This change will allow for more flexibility when defining the layout of multiscales and embrace already existing standards instead of reinventing the wheel.

The Tile Matrix Set standard includes all information currently covered by the dataset definition and includes additional information on chunk layout, pixel size, and origin of the matrix.

felixcremer commented 4 months ago

Do you have a link to the OGC Tile Matrix Set standard. I am currently working on https://github.com/JuliaDataCubes/PyramidScheme.jl a Julia package for generating and working with pyramid datasets mainly for plotting and I aim to be complaint with geozarr in reading and writing these datasets.

I will have a more depth look in the next days and try to implement this standard in Julia.

thomas-maschler commented 4 months ago

Here the link: https://docs.ogc.org/is/17-083r4/17-083r4.html

briannapagan commented 3 months ago

@thomas-maschler discussed in the SWG meeting today, it would be helpful before approving PRs like this if we have an example zarr store to test interoperability before approving - can you provide one? A few of us are available for testing.

thomas-maschler commented 3 months ago

@thomas-maschler discussed in the SWG meeting today, it would be helpful before approving PRs like this if we have an example zarr store to test interoperability before approving - can you provide one? A few of us are available for testing.

@briannapagan, initially I discussed with @maxrjones that he would give it a first try, he was planning to add some extra functionality to ndpyramids. However, if he didn't manage to find the time for this I should be able to do that and create some example Zarr stores with different overview layouts/ TMS.

maxrjones commented 3 months ago

@thomas-maschler discussed in the SWG meeting today, it would be helpful before approving PRs like this if we have an example zarr store to test interoperability before approving - can you provide one? A few of us are available for testing.

@briannapagan, initially I discussed with @maxrjones that he would give it a first try, he was planning to add some extra functionality to ndpyramids. However, if he didn't manage to find the time for this I should be able to do that and create some example Zarr stores with different overview layouts/ TMS.

My apologies, I haven't found time for this yet.

briannapagan commented 2 months ago

We have some folks interested in having a dedicated discussion about this PR and understanding some of its implications, can @maxrjones @thomas-maschler @felixcremer @wietzesuijker join our next bi-weekly zarr call?

felixcremer commented 2 months ago

I won't most likely not be able to attend this weeks geozarr call, since Wednesday is a public holiday in Germany.

I worked on implementing the multiscale functionality in PyramidScheme.jl and I am more and more convinced, that the multiscale specification should be independent from the geozarr specification. Building pyramids of a dataset is not restricted to geospatial data but is also
used in bioimaging for example. see the on going discussion about
multiscales in Zarr https://github.com/zarr-developers/zarr-specs/issues/125. So I would suggest not to define multiscale images as part of the GeoZarr spec but rather work on a domain-agnostic multiscale convention and once that is finished we link to it in the GeoZarr spec.

As a side note, a source of recurrent confusion when implementing TMS for GeoZarr was that in TMS the concept of "TIles" is a central part of the specification. In contrast, the Zarr specs present n-dimensional arrays to the user which can be seen as one entity and where the chunking structure is rather an (important) implementation detail In practice this means that when users query a subset of a zarr array a in all zarr implementations they know they would simply write some form of a[start_index:end_index] so requests are done on pixel-level and the implementation takes care of looking up the correct chunks. On the other hand queries into a TMS are explicitly by tile, meaning that the user queries tiles for a given bounding box and is left with the overhead of concatenating the results. In my PyramidScheme.jl implementation it felt weird to mix these two worlds of tile-based and element-based access and I would be interested to see an implementation that puts TMS and Zarr together for some inspiration. Until then I tend to favor the multiscale convention proposal linked above, since it seems more in line with the general zarr interface idea.

christophenoel commented 2 months ago

Building pyramids of a dataset is not restricted to geospatial data but is also used in bioimaging

Hi Felix,

Being not restricted to geospatial data, this is similar to many aspects covered by GeoZarr, which aims to reuse existing standards (such as OGC Tile Matrix Set) and indicate which location/placeholder must be used in the encoding.

However, it is important to note that the pyramid structure is a key aspect for GeoZarr as it aims to offer functions equivalent to alternative formats such as COG within the Zarr format. Additionally, pyramid structures for Earth Observation (EO) data have their own particularities, such as resolution, compared to geospatially agnostic pyramids.

christophenoel commented 2 months ago

Feel free to check the playlist below for demonstration of the pyramids encoded in Zarr datasets: https://www.youtube.com/watch?v=NYhh66EstnY&list=PLzPGC4s5HQOPdeLoK1MXK6gEa1x2Az8Dn

thomas-maschler commented 2 months ago

@briannapagan is it still worth joining tomorrow's call? I will only be able to join during the second half. If this can wait another two weeks, i might be able to prepare a POC.

mdsumner commented 2 months ago

I'm following this, unlikely I can make the call sadly

maxrjones commented 2 months ago

I received a calendar notice that the meeting was moved to next week; unfortunately I am unavailable on May 8th at 11 ET but could join in two weeks.