pydata / xarray

N-D labeled arrays and datasets in Python
https://xarray.dev
Apache License 2.0
3.63k stars 1.09k forks source link

Error when loading Zarr chunk missing. #9701

Closed alvarosg closed 3 weeks ago

alvarosg commented 3 weeks ago

Is your feature request related to a problem?

When opening a Zarr dataset with xarray.open_zarr and then calling compute on a slice, if a given Zarr chunk file is missing the data comes back filled with nan's. This is problematic in some cases as it makes it impossible to distinguish whether the nan's are legit nans in the data, or are a result of missing chunk file. Also checking for nan's on large arrays is expensive.

Describe the solution you'd like

Ideally, when trying to call compute on a slice of data from a Zarr datasets for which a chunk is missing, there should be an option that by default raises an error if a chunk file is missing.

For example:

dataset = xarray.open_dataset("path_to_zarr_with_missing_chunk_for_2021-01-02.zarr", error_on_missing_chunks)

data_slice = dataset.sel(time="2021-01-01")
data_slice.compute()

data_slice = dataset.sel(time="2021-01-02")
data_slice.compute(). # Raises MissingChunkError("Could not retrieve data. At least one chunk for the selected slice is missing")

Describe alternatives you've considered

No response

Additional context

No response

dcherian commented 3 weeks ago

This is an upstream issue. Zarr is returning a chunk with all values as fill_value. https://github.com/zarr-developers/zarr-python/issues/486