hdmf-dev / hdmf-zarr

Zarr I/O backend for HDMF
https://hdmf-zarr.readthedocs.io/
Other
7 stars 7 forks source link

[Feature]: S3 streaming support via fsspec #134

Closed oruebel closed 9 months ago

oruebel commented 10 months ago

What would you like to see added to HDMF-ZARR?

Support streaming using fsspec

Is your feature request related to a problem?

https://github.com/NeurodataWithoutBorders/helpdesk/discussions/56#discussioncomment-7299446

What solution would you like?

Add FSStore to support fsspect-based streaming from S3 and/or allow passing of a zarr.Group with the parent group as a read-only store for ZarrIO.__init__(path=...)

https://github.com/hdmf-dev/hdmf-zarr/blob/70bf35b60ed2c2eaae7b12080bc1f4cc3d89ba3e/src/hdmf_zarr/backend.py#L66C1-L69C47

Do you have any interest in helping implement the feature?

Yes.

Code of Conduct

alejoe91 commented 10 months ago

@oruebel this came up elsewhere (see comment). Note that zarr natively supports reading from the cloud. On my side, this works

import zarr

remote_zarr_location =  = "s3://aind-open-data/ecephys_625749_2022-08-03_15-15-06_nwb_2023-05-16_16-34-55/ecephys_625749_2022-08-03_15-15-06_nwb/ecephys_625749_2022-08-03_15-15-06_experiment1_recording1.nwb.zarr/"

zarr_root = zarr.open(remote_zarr_location)

When trying with the NWBZarrIO wrapper, some links failed to be resolved because the resolve link function assumes the file/folder is sitting on disk. It would be probably an easy fix to make it work! I can give it a try!