zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

Compatibility with NCZarr. #22

Closed dblodgett-usgs closed 1 year ago

dblodgett-usgs commented 1 year ago

I've just left comments in #18 and #3 asking about this, perhaps you can summarize your thoughts here @christophenoel ?

Scanning this issue: https://github.com/zarr-developers/zarr-specs/issues/41

I tend to think that NCZarr is a lower level concern and that GeoZarr should try to be agnostic to the particular way that NetCDF attributes are encoded into Zarr. That is, I think that GeoZarr should work whether you also need features supported by NCZarr or are using vanilla Zarr. Someone with a deeper knowledge of the landcape will need to chime in here though. @rabernat or someone else -- can you recommend and pull in someone who might be able to help recommend a path forward on this topic?

dblodgett-usgs commented 1 year ago

From: 21-050r1_Zarr_Community_Standard.pdf

Beginning with NetCDF-C version 4.8.0, Unidata introduced experimental Zarr support into the NetCDF-C library. This was accomplished via creating a new specification - NCZarr - which is “similar to, but not identical with the Zarr Version 2 Specification.” Specifically, NCZarr adds two additional metadata files (“.nczarray" and ".nczattr”), which are not part of the Zarr V2 Spec. Since NCZarr stores are not fully compatible and interoperable with Zarr V2, this community standard excludes NCZarr. Work is ongoing to reconcile NCZarr and the architectural reasons that motivated its development with the forthcoming Zarr V3 Specification. Fortunately, the NetCDF-C library also supports reading / writing of data using the simpler Named Dimension convention described in 4.1.

This seems to indicate that what I stated above, that NCZarr is a lower level concern is basically true.

christophenoel commented 1 year ago

I don't think this really helps, but I found the following section written by the team in the Cloud-native Format TradeOff report (HDSA project):

NCZarr origins from the need of a pure C implementation of the Zarr API. Starting from 2018, Unidata NetCDF team joined the discussion and provided effort to bring support of Zarr datastores to the existing netcdf-c library. This effort resulted in the creation of an extension of the Zarr specification called NCZarr, which provides a data model close to the existing netcdf-4 data model. Starting from NetCDF version 4.8.0 (April 2021), the Zarr (and NCZarr) data formats are supported by the netcdf-c library. Since then, the netcdf-c library can provide access to cloud storage such as Amazon S3. The first NCZarr version relies on special objects that are not part of the Zarr specification (and not supported by the official Zarr library). In the new version of NCZarr, this is no longer the case. The new NCZarr version relies only on pure Zarr objects, using predefined named attributes as a convention. Those specific attributes shall not generate any error when reading Zarr files but might be ignored depending on the Zarr library implementation used.

dblodgett-usgs commented 1 year ago

That aligns with what's been discussed elsewhere. I'll close this issue as I think we've determined that NCZarr is not an issue for geozarr to overcome.