When opening a Zarr dataset with xarray.open_zarr and then calling compute on a slice, if a given Zarr chunk file is missing the data comes back filled with nan's. This is problematic in some cases as it makes it impossible to distinguish whether the nan's are legit nans in the data, or are a result of missing chunk file. Also checking for nan's on large arrays is expensive.
Describe the solution you'd like
Ideally, when trying to call compute on a slice of data from a Zarr datasets for which a chunk is missing, there should be an option that by default raises an error if a chunk file is missing.
For example:
dataset = xarray.open_dataset("path_to_zarr_with_missing_chunk_for_2021-01-02.zarr", error_on_missing_chunks)
data_slice = dataset.sel(time="2021-01-01")
data_slice.compute()
data_slice = dataset.sel(time="2021-01-02")
data_slice.compute(). # Raises MissingChunkError("Could not retrieve data. At least one chunk for the selected slice is missing")
Is your feature request related to a problem?
When opening a Zarr dataset with
xarray.open_zarr
and then calling compute on a slice, if a given Zarr chunk file is missing the data comes back filled with nan's. This is problematic in some cases as it makes it impossible to distinguish whether the nan's are legit nans in the data, or are a result of missing chunk file. Also checking for nan's on large arrays is expensive.Describe the solution you'd like
Ideally, when trying to call
compute
on a slice of data from a Zarr datasets for which a chunk is missing, there should be an option that by default raises an error if a chunk file is missing.For example:
Describe alternatives you've considered
No response
Additional context
No response