zarr-developers / geozarr-spec

This document aims to provides a geospatial extension to the Zarr specification. Zarr specifies a protocol and format used for storing Zarr arrays, while the present extension defines conventions and recommendations for storing multidimensional georeferenced grid of geospatial observations (including rasters).
106 stars 10 forks source link

nco interoperability #40

Open christine-e-smit opened 4 months ago

christine-e-smit commented 4 months ago

Using ncks version 5.1.9, which is part of the nco tools, I was able to list a zarr store

Data used: https://github.com/zarr-developers/geozarr-spec/issues/36

To reproduce:

  1. Download the data and unzip it.
  2. At the command line, run:
ncks -m "file:///YOUR/PATH/TO/GLDAS_NOAH025_3H.zarr#mode=nczarr,zarr"
christine-e-smit commented 4 months ago

The geotiff-like zarr store does not appear to work, but this is not terribly surprising. Both nco and the python netCDF4 library use the NetCDF-C library to open zarr. I was unable to open this zarr store with the python netCDF4 library (https://github.com/zarr-developers/geozarr-spec/issues/39), so there's a good chance the issue is in the NetCDF-C library.

christine-e-smit commented 4 months ago

Using the zarr store without compression, I was able to create a zarr store using ncks to do a subset.

ncks -d latitude,0.,10. "file:///YOUR/PATH/zarr_no_compression.zarr#mode=nczarr,zarr" "file:///YOUR/PATH/subset.zarr#mode=nczarr,zarr"

I was then able to open this store with xarray:

In [1]: import xarray as xr

In [2]: ds = xr.open_zarr('subset.zarr')
<ipython-input-2-89d3eedb3e59>:1: RuntimeWarning: Failed to open Zarr store with consolidated metadata, but successfully read with non-consolidated metadata. This is typically much slower for opening a dataset. To silence this warning, consider:
1. Consolidating metadata in this existing store with zarr.consolidate_metadata().
2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or
3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
  ds = xr.open_zarr('subset.zarr')

In [3]: ds
Out[3]: 
<xarray.Dataset>
Dimensions:           (latitude: 40, nv: 2, longitude: 1440, time: 2)
Coordinates:
  * latitude          (latitude) float32 0.125 0.375 0.625 ... 9.375 9.625 9.875
  * longitude         (longitude) float32 -179.9 -179.6 -179.4 ... 179.6 179.9
  * time              (time) datetime64[ns] 2000-01-01 2000-01-02
Dimensions without coordinates: nv
Data variables:
    latitude_bounds   (latitude, nv) float32 dask.array<chunksize=(40, 2), meta=np.ndarray>
    longitude_bounds  (longitude, nv) float32 dask.array<chunksize=(1440, 2), meta=np.ndarray>
    time_bounds       (time, nv) datetime64[ns] dask.array<chunksize=(2, 2), meta=np.ndarray>
    variable          (latitude, longitude, time) float32 dask.array<chunksize=(40, 720, 1), meta=np.ndarray>
Attributes:
    history:  Thu Feb  8 12:09:18 2024: ncks -d latitude,0.,10. file:///Users...
    NCO:      netCDF Operators version 5.1.9 (Homepage = http://nco.sf.net, C...

In [4]: ds['latitude']
Out[4]: 
<xarray.DataArray 'latitude' (latitude: 40)>
array([0.125, 0.375, 0.625, 0.875, 1.125, 1.375, 1.625, 1.875, 2.125, 2.375,
       2.625, 2.875, 3.125, 3.375, 3.625, 3.875, 4.125, 4.375, 4.625, 4.875,
       5.125, 5.375, 5.625, 5.875, 6.125, 6.375, 6.625, 6.875, 7.125, 7.375,
       7.625, 7.875, 8.125, 8.375, 8.625, 8.875, 9.125, 9.375, 9.625, 9.875],
      dtype=float32)
Coordinates:
  * latitude  (latitude) float32 0.125 0.375 0.625 0.875 ... 9.375 9.625 9.875
Attributes:
    bounds:         latitude_bnds
    standard_name:  latitude
    units:          degrees_north
christine-e-smit commented 4 months ago

I tried using ncks to read something from s3 and ran into this error:

> ncks -m "s3://us-west-2.opendata.source.coop/zarr/geozarr-tests/zarr_no_compression.zarr" 
ncks: ERROR file "s3://us-west-2.opendata.source.coop/zarr/geozarr-tests/zarr_no_compression.zarr" not found. It does not exist on the local filesystem, nor does it match remote filename patterns (e.g., http://foo or foo.bar.edu:file).
ncks: HINT file-not-found errors usually arise from filename typos, incorrect paths, missing files, or capricious gods. Please verify spelling and location of requested file. If the file resides on a High Performance Storage System (HPSS) accessible via the 'hsi' command, then add the --hpss option and re-try command.
(nco) gs6102m1csmit:~/Projects/zarr_nyc_2024/nco % ncks -m "s3://us-west-2.opendata.source.coop/zarr/geozarr-tests/zarr_no_compression.zarr#mode=nczarr,zarr" 
ncks: INFO nco_fl_mk_lcl() failed to nc_open() this Zarr-scheme file even though NCZarr is enabled. HINT: Check that filename adheres to this syntax: scheme://host:port/path?query#fragment and that filename exists. NB: s3 scheme requires that netCDF be configured with –enable-nczarr-s3 option.
HINT: As of 20230321, a known problem is that NCO (and ncdump) have trouble reading compressed NCZarr datasets. This can manifest as error code -137, "NetCDF: NCZarr error". If the next line reports that error, the error may be due to this issue, i.e., to a codec issue uncompressing the dataset:
Translation into English with nc_strerror(-128) is "NetCDF: Attempt to use feature that was not turned on when netCDF was built."
ncks: ERROR file "s3://us-west-2.opendata.source.coop/zarr/geozarr-tests/zarr_no_compression.zarr#mode=nczarr,zarr" not found. It does not exist on the local filesystem, nor does it match remote filename patterns (e.g., http://foo or foo.bar.edu:file).
ncks: HINT file-not-found errors usually arise from filename typos, incorrect paths, missing files, or capricious gods. Please verify spelling and location of requested file. If the file resides on a High Performance Storage System (HPSS) accessible via the 'hsi' command, then add the --hpss option and re-try command.