hdmf-dev / hdmf-zarr

Zarr I/O backend for HDMF
https://hdmf-zarr.readthedocs.io/
Other
7 stars 7 forks source link

Fix linking for FSSpec #138

Closed alejoe91 closed 9 months ago

alejoe91 commented 10 months ago

Motivation

The ZarrIO.resolve_ref() function assumes paths are local. This PR adds a method is_remote() to check if the zarr.store is FSStore and handles remote paths and links correctly.

Fixes #134

How to test the behavior?

from hdmf_zarr import NWBZarrIO

remote_zarr_location = "s3://aind-open-data/ecephys_625749_2022-08-03_15-15-06_nwb_2023-05-16_16-34-55/ecephys_625749_2022-08-03_15-15-06_nwb/ecephys_625749_2022-08-03_15-15-06_experiment1_recording1.nwb.zarr/"

with NWBZarrIO(remote_zarr_location, "r") as io:
    nwbfile = io.read()

print(nwbfile)

Checklist

alejoe91 commented 10 months ago

Should I add a test?

codecov-commenter commented 10 months ago

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (b51e127) 86.10% compared to head (c1204f5) 86.24%. Report is 4 commits behind head on dev.

Files Patch % Lines
tests/unit/utils.py 71.42% 2 Missing :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #138 +/- ## ========================================== + Coverage 86.10% 86.24% +0.13% ========================================== Files 13 14 +1 Lines 3189 3221 +32 ========================================== + Hits 2746 2778 +32 Misses 443 443 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

alejoe91 commented 10 months ago

Should I add a test?

Added one using the Allen Institute dataset. Not sure if and where you guys want to add it to the CI, since it's currently quite slow. Maybe it would be better to keep this as a place holder and run the test once we have a small test file stored somewhere?

oruebel commented 10 months ago

Thanks for taking a stab at this! We can work on getting a small file for testing on DANDI hopefully while we are at SfN. The PR looks overall good to me.

oruebel commented 10 months ago

@alejoe91 When running the test locally, I get the following error:

FAILED tests/unit/test_fsspec_streaming.py::TestFSSpecStreaming::test_fsspec_streaming - botocore.exceptions.NoCredentialsError: Unable to locate credentials

Is there a way to avoid the need for credentials in the test and/or would using a file on DANDI fix this?

I made a few minor changes to fix flake8 warnings (to avoid unused import warnings) and adding fsspec and s3fs as optional requirements.

With regard to the time it takes for the test, I think this is Ok for now. I.e., I would be in favor of merging the PR and creating a separate issue to for creating a smaller test file to speed up the test. I.e., aside from the credential issues with the test, this PR looks good to me.

oruebel commented 10 months ago

The credential error also occured in the GitHub action here:

https://github.com/hdmf-dev/hdmf-zarr/actions/runs/6838973516/job/18596637194?pr=138#step:7:119

alejoe91 commented 10 months ago

I see! I think I haven't propagated the storage_options parameter to all zarr.open()! I'll push a fix tomorrow

oruebel commented 9 months ago

Just for reference, to help create a test with DANDI:

Example code how to get a S3 URL for Zarr read from DANDI

from dandi.dandiapi import DandiAPIClient
from dandi.consts import known_instances

staging_instance = known_instances["dandi-staging"]
dandiset_id = "204919"
version_id = "draft"

client = DandiAPIClient(dandi_instance=staging_instance)
dandiset = client.get_dandiset(dandiset_id=dandiset_id, version_id=version_id)
zarr_asset = dandiset.get_asset_by_path(path="test_read_nwbfile/test2.nwb.zarr")
metadata = zarr_asset.get_metadata()
content = metadata.contentUrl[1]
s3_url = f"s3://dandiarchive{content.path}"