Closed trondactea closed 2 years ago
I think you need to actually consolidate the metadata in a separate step. See here
The conversion went well with rechunker but when trying to read the dataset using
xarray.open_zarr
it fails due to missing.zmetadata
Can you share the full error traceback you obtained?
The full traceback is shown below when I try to run:
for var_name in ["thetao"]:
zarr_url = f"gs://shared/zarr/target.zarr/{var_name}"
mapper = fs.get_mapper(zarr_url)
ds = xr.open_zarr(mapper, consolidated=True)
Traceback:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/jovyan/datasets/run_zarr_tests.py", line 32, in <module>
ds = xr.open_zarr(mapper, consolidated=True)
File "/opt/conda/lib/python3.9/site-packages/xarray/backends/zarr.py", line 768, in open_zarr
ds = open_dataset(
File "/opt/conda/lib/python3.9/site-packages/xarray/backends/api.py", line 495, in open_dataset
backend_ds = backend.open_dataset(
File "/opt/conda/lib/python3.9/site-packages/xarray/backends/zarr.py", line 824, in open_dataset
store = ZarrStore.open_group(
File "/opt/conda/lib/python3.9/site-packages/xarray/backends/zarr.py", line 384, in open_group
zarr_group = zarr.open_consolidated(store, **open_kwargs)
File "/opt/conda/lib/python3.9/site-packages/zarr/convenience.py", line 1183, in open_consolidated
meta_store = ConsolidatedMetadataStore(store, metadata_key=metadata_key)
File "/opt/conda/lib/python3.9/site-packages/zarr/storage.py", line 2590, in __init__
meta = json_loads(store[metadata_key])
File "/opt/conda/lib/python3.9/site-packages/fsspec/mapping.py", line 139, in __getitem__
raise KeyError(key)
KeyError: '.zmetadata'
Ah ok, so your options are
ds = xr.open_zarr(mapper, consolidated=False)
or
from zarr.convenience import consolidate_metadata
consolidate_metadata(mapper)
ds = xr.open_zarr(mapper, consolidated=True)
Are you suggesting that we should automatically consolidate the target within rechunker?
I thought that the availability of .zmetadata
for large datasets speeds up performance. If I can create the metadata using the zarr
function that works well of course. For me, the automatic creation of .zmetadata
would be very useful, but I don't have deep experience with zarr
. Thanks for your help.
I thought that the availability of
.zmetadata
for large datasets speeds up performance.
It can speeds up the process of initializing the dataset itself (xr.open_zarr
) if the underlying store (GCS in this case) is slow to list. There is no performance impact after that.
That makes sense. Thanks, @rabernat and @jbusecke.
First, thanks for this great toolbox!
I have to rechunk an existing global
zarr
dataset (GLORYS ocean model) with existing chunks(1, 50, 2041, 4320) (time,depth,lat,lon)
. Using this global dataset, I frequently extract regional domains that are typically 10x10 degrees lat-lon in size. I thought quicker read access would be achieved if I rechunked to(324,50,100,100)
.The conversion went well with rechunker but when trying to read the dataset using
xarray.open_zarr
it fails due to missing.zmetadata
. The original .zarr
data set has consolidated metadata available.Is there an option to create the metadata, or is my approach wrong here? My code for converting the existing
zarr
dataset is below. I appreciate any help here!Thanks, Trond