Open dstansby opened 9 months ago
To illustrate that this should work in theory, with tensorstore I don't have an issue and the following code works:
import tensorstore as ts
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
dataset = ts.open({
"driver": "n5",
"kvstore": f"gs://{bucket}/LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0",
'context': {
'cache_pool': {
'total_bytes_limit': 100_000_000
}
}
}).result()
x = dataset[0, 0, 0].read().result()
print(x)
cc @martindurant
There is no .zgroup at the given path. I'm not sure of the sematics of zarr.group (as opposed to open_group() ), but it appears to be "create if doesn't exist". So gcsfs is doing the right thing.
What does exist in the location is attributes.json - is this a V3 thing? I can see it has some zarr-like information in, but not the usual .z stuff; e.g., here is the one inside s0:
{"axes":["x","y","z"],
"blockSize":[128,128,128],
"compression":{"blocksize":0,"clevel":9,"cname":"zstd","shuffle":2,"type":"blosc"},
"dataType":"uint16",
"dimensions":[3020,3412,2829],
"neuroglancer-pipeline-version":"1"}
What does exist in the location is attributes.json - is this a V3 thing?
no, it's n5
Is zarr.group expected to be able to read that?
Also, turning on the logger "gcsfs" would tell you what call is actually causing the exception: what is zarr trying to create?
import fsspec
fsspec.utils.setup_logging(logger_name="gcsfs")
Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.
Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.
That's a question for @jbms :) I don't know much about tensorstore myself.
@dstansby you should probably look at N5FSStore
, which is FSStore modified to support N5.
my apologies for presumptively pinging you martin, I didn't notice immediately that this was actually an N5 thing.
Apologies, the path should have "/s0" on the end. I also understand that I should use open_array instead as I have an N5 array and not a group. Updated code (that gives me the same error):
import gcsfs
import zarr
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
fs.ls(bucket) # Works
store = fs.get_mapper(root=bucket)
group = zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")
So I'm not sure how I can combine a N5 store with the GCSFileSystem?
Side issue, @d-v-b : what are the prospects of tensorstore reading kerchunk manifests in JSON or parquet? This doesn't belong here, but I wonder whether there has been any thought along that.
That's a question for @jbms :) I don't know much about tensorstore myself.
@dstansby you should probably look at
N5FSStore
, which is FSStore modified to support N5.
That is definitely something we could support pretty easily, and has been on the TODO list.
@dstansby does this work for you?
import zarr
import gcsfs
bucket = "ucl-hip-ct-35a68e99feaae8932b1d44da0358940b"
fs = gcsfs.GCSFileSystem(project='ucl-hip-ct', token='anon', access='read_only')
store = zarr.N5FSStore(url=bucket, fs=fs)
zarr.open_array(store=store, path="LADAF-2020-27/kidney-left/25.08um_complete-organ_bm05/s0")
Thanks, it did! I've opened a PR to the docs to make sure the working example doesn't get lost to a closed issue 😄
I transferred this issue from zarr-python
to n5py
.
Zarr version
2.17.0
Numcodecs version
0.12.1
Python Version
3.11.7
Operating System
macOS
Installation
using conda
Description
I am trying to access a read-only Google Cloud Storage bucket, but my code is failing. It looks like it's because zarr is trying to write something to the bucket, but I haven't properly worked out what's going wrong.
If I'm doing something wrong, it would be nice to add an example to the documentation to make this easy to do in the future.
Steps to reproduce
Additional output
No response