zarr-developers / zarr-python

An implementation of chunked, compressed, N-dimensional arrays for Python.
https://zarr.readthedocs.io
MIT License
1.51k stars 279 forks source link

FSStore silently returns bogus chunk data when FSMap passes an exception #1578

Open itcarroll opened 11 months ago

itcarroll commented 11 months ago

Zarr version

2.16.1

Numcodecs version

0.11.0

Python Version

3.10.12

Operating System

Linux

Installation

using conda

Description

The zarr.storage.FSStore chunk store currently ignores an important exception that the underlying fsspec.mapping.FSMap is able to provide when calling getitems. The FSMap detects when the reference file system is not reachable (because of network problems or missing/invalid credentials) and returns an exception. But because the FSStore.getitems method sets the on_error argument to "omit", the FSStore treats the values as missing. They're not missing, there was an error that the user needs to know about.

Steps to reproduce

import zarr
from fsspec.implementations.reference import ReferenceFileSystem

fo = {
    "version": 1,
    "refs": {
        ".zgroup": '{"zarr_format":2}',
        "time/.zarray": '{"chunks":[512],"compressor":null,"dtype":"<f8","fill_value":null,"filters":null,"order":"C","shape":[1],"zarr_format":2}',
        "time/0": [
            "s3://gesdisc-cumulus-prod-protected/GLDAS/GLDAS_NOAH025_3H.2.1/2018/152/GLDAS_NOAH025_3H.A20180601.0000.021.nc4",
            22998,
            4096,
        ],
    },
}
fs = ReferenceFileSystem(fo, remote_protocol="s3", remote_options={"key": "foo", "secret": "bar", "token": "baz"})
group = zarr.open_group(fs.get_mapper(""))
group["time"][:]

The returned value is bogus if the credentials are invalid, as above, but no error is raised. The correct value for "time" is array([9685260.]).

Additional output

No response

d-v-b commented 11 months ago

If I understand this issue correctly, the problem is that zarr is fine with one particular exception (the one raised when a chunk is missing), as that gets handled via the fill_value semantics. But I totally agree that ignoring all exceptions is a clumsy way to achieve this. I can think of a variety of solutions, but I think they have to happen on the fsspec side.

joshmoore commented 11 months ago

cc: @martindurant

itcarroll commented 11 months ago

@d-v-b I think you understand correctly. Why can't zarr's FSStore leave on_error at its default of "return" and then decide which error it wants to raise? The FSMap.getitems would return an instance of ReferenceNotReachable(RuntimeError) for this case.

martindurant commented 11 months ago

cc https://github.com/zarr-developers/zarr-python/pull/1237 tries to fix this, but I don't remember why it's not complete.

itcarroll commented 11 months ago

@martindurant I rebased your PR on main (locally, I mean, since I'm not a zarr dev) and saw the same pytest failures reported by the checks on your PR. Comments added there.