zarr-developers / n5py

Python implementation of the N5 file format
MIT License
0 stars 0 forks source link

Can't open a read-only Google Cloud Storage store #14

Open dstansby opened 9 months ago

dstansby commented 9 months ago

Zarr version

2.17.0

Numcodecs version

0.12.1

Python Version

3.11.7

Operating System

macOS

Installation

using conda

Description

I am trying to access a read-only Google Cloud Storage bucket, but my code is failing. It looks like it's because zarr is trying to write something to the bucket, but I haven't properly worked out what's going wrong.

If I'm doing something wrong, it would be nice to add an example to the documentation to make this easy to do in the future.

Steps to reproduce

import gcsfs
import zarr

bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket)  # Works
store = fs.get_mapper(root=bucket)
group = zarr.group(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05")
_request non-retriable exception: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401
Traceback (most recent call last):
  File "/Users/dstansby/software/hipct/hipct-reg/scripts/test_real_data.py", line 8, in <module>
    group = zarr.group(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05")
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/hierarchy.py", line 1427, in group
    init_group(store, overwrite=overwrite, chunk_store=chunk_store, path=path)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 668, in init_group
    _require_parent_group(path, store=store, chunk_store=chunk_store, overwrite=overwrite)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 313, in _require_parent_group
    _init_group_metadata(store, path=p, chunk_store=chunk_store)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 736, in _init_group_metadata
    store[key] = store._metadata_class.encode_group_metadata(meta)
    ~~~~~^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/zarr/storage.py", line 1449, in __setitem__
    self.map[key] = value
    ~~~~~~~~^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/mapping.py", line 171, in __setitem__
    self.fs.pipe_file(key, maybe_convert(value))
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 118, in wrapper
    return sync(self.loop, func, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 103, in sync
    raise return_result
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/fsspec/asyn.py", line 56, in _runner
    result[0] = await coro
                ^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 1268, in _pipe_file
    location = await simple_upload(
               ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 1954, in simple_upload
    j = await fs._call(
        ^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 437, in _call
    status, headers, info, contents = await self._request(
                                      ^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/decorator.py", line 221, in fun
    return await caller(func, *(extras + args), **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 158, in retry_request
    raise e
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 123, in retry_request
    return await func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/core.py", line 430, in _request
    validate_response(status, contents, path, args)
  File "/Users/dstansby/miniconda3/envs/hipct/lib/python3.11/site-packages/gcsfs/retry.py", line 110, in validate_response
    raise HttpError(error)
gcsfs.retry.HttpError: Anonymous caller does not have storage.objects.create access to the Google Cloud Storage object. Permission 'storage.objects.create' denied on resource (or it may not exist)., 401

Additional output

No response

dstansby commented 9 months ago

To illustrate that this should work in theory, with tensorstore I don't have an issue and the following code works:

import tensorstore as ts
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"

dataset = ts.open({
    "driver": "n5",
    "kvstore": f"gs://{bucket}/LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0",
    'context': {
         'cache_pool': {
             'total_bytes_limit': 100_000_000
         }
     }
}).result()

x = dataset[0, 0, 0].read().result()
print(x)
d-v-b commented 9 months ago

cc @martindurant

martindurant commented 9 months ago

There is no .zgroup at the given path. I'm not sure of the sematics of zarr.group (as opposed to open_group() ), but it appears to be "create if doesn't exist". So gcsfs is doing the right thing.

What does exist in the location is attributes.json - is this a V3 thing? I can see it has some zarr-like information in, but not the usual .z stuff; e.g., here is the one inside s0:

{"axes":["x","y","z"],
"blockSize":[128,128,128],
"compression":{"blocksize":0,"clevel":9,"cname":"zstd","shuffle":2,"type":"blosc"},
"dataType":"uint16",
"dimensions":[3020,3412,2829],
"neuroglancer-pipeline-version":"1"}
d-v-b commented 9 months ago

What does exist in the location is attributes.json - is this a V3 thing?

no, it's n5

martindurant commented 9 months ago

Is zarr.group expected to be able to read that?

Also, turning on the logger "gcsfs" would tell you what call is actually causing the exception: what is zarr trying to create?

import fsspec
fsspec.utils.setup_logging(logger_name="gcsfs")
martindurant commented 9 months ago

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

d-v-b commented 9 months ago

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

That's a question for @jbms :) I don't know much about tensorstore myself.

@dstansby you should probably look at N5FSStore, which is FSStore modified to support N5.

d-v-b commented 9 months ago

my apologies for presumptively pinging you martin, I didn't notice immediately that this was actually an N5 thing.

dstansby commented 9 months ago

Apologies, the path should have "/s0" on the end. I also understand that I should use open_array instead as I have an N5 array and not a group. Updated code (that gives me the same error):


import gcsfs
import zarr

bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket)  # Works
store = fs.get_mapper(root=bucket)
group = zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")

So I'm not sure how I can combine a N5 store with the GCSFileSystem?

jbms commented 9 months ago

Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.

That's a question for @jbms :) I don't know much about tensorstore myself.

@dstansby you should probably look at N5FSStore, which is FSStore modified to support N5.

That is definitely something we could support pretty easily, and has been on the TODO list.

d-v-b commented 9 months ago

@dstansby does this work for you?

import zarr
import gcsfs
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
store = zarr.N5FSStore(url=bucket, fs=fs)
zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")
dstansby commented 8 months ago

Thanks, it did! I've opened a PR to the docs to make sure the working example doesn't get lost to a closed issue 😄

d-v-b commented 1 month ago

I transferred this issue from zarr-python to n5py.