Open adair-kovac opened 3 years ago
Maybe @martindurant has some insights?
Some thoughts:
acl="public-read"
.consolidate_zarr
could be made to accept extra parameters to define how the .zmetadata file is made, perhaps even accept a file-like object to write into, so the user gets complete controlmissing_exceptions
kwarg to get_mapper
), but we can talk about what the best defaults are.
This is a minor issue with an invalid data setup that's easy to get into, I'm reporting it here more for documentation than expecting a fix.
Quick summary: If you consolidate metadata on a public zarr dataset in S3, the .zmetadata files end up permission-restricted. So if you try reading with xarray 0.19, it gives an unclear error message and fails to open the dataset, while reading with xarray <=0.18.x goes fine. Contrast this with the nice warnings you get in 0.19 if the .zmetadata just doesn't exist.
How this happens
People who wrote data without consolidated=True in past versions of xarray might run into this issue, but in our case we actually did have that parameter originally.
I've previously reported a few issues with the fact that xarray will write arbitrary zarr hierarchies if the variable names contain slashes, and then can't read them properly. One consequence of this is that data written with consolidated=True still doesn't have .zmetadata files where they're needed for xarray.open_mfdataset to read them.
If you try to add the .zmetadata by running
directly on the cloud bucket in S3, it writes the .zmetadata files... but permissions are restricted to the user who uploaded them even if you're writing to a public bucket. (It's an AWS thing.)
Why it's a problem
It would be nice if when xarray goes to read this data, it would see that it has access to the data but not to any usable .zmetadata and spit out the warning like it does if .zmetadata doesn't exist. Instead it fails on an uncaught PermissionError: Access Denied, and it's not clear from the output that this is just a .zmetadata issue and the user can still get the data by passing consolidated=False.
Another problem with this situation is that it causes data that reads just fine in xarray 0.18.x without even a warning message to suddenly give Access Denied from the same code when you update to xarray 0.19.
Work around
If you're trying to read a dataset that has this issue, you can get the same behavior as in previous versions of xarray like so:
The data loads fine, just more slowly than if the .zmetadata were accessible.
Stacktrace