Open bjhardcastle opened 4 months ago
I am able to open a file with consolidated metadata using s3. I'll look into this.
Open works, but read does not for a certain file, but the error is not the same. I do not believe it is related.
@bjhardcastle I'm not able to reproduce the same error. Is it possible for you to create a public example? (a zarr file I can use)
We have tests that read consolidate metadata files that are passing so I am not sure what you are facing.
@bjhardcastle I think this is due to some export
functions on our side without the write_args={'link_data': False}
as reported in this issue: https://github.com/hdmf-dev/hdmf-zarr/issues/179, not to a specific read of consolidated data from s3
On our end, all export functions have been now updated to force link_data: False
. Can you try to generate a new NWB file with the pipeline and see if the problem persists? Once a fix on the hdmf-zarr
is in place, we can drop the write_args
@mavaylon1 Are you talking about these tests? https://github.com/hdmf-dev/hdmf-zarr/commit/2ba6bc5cc83a68b5d5a4ec1e9816cd2913f8e7db
They aren't sufficient:
Here's an MRE with reading added to your test:
# python -m venv .venv-hdmf-test
# .venv-hdmf-test/scripts/activate (Windows)
# source .venv-hdmf-test/bin/activate (Linux)
# python -m ensurepip
# python -m pip install hdmf-zarr fsspec s3fs
import hdmf_zarr
import zarr
s3_path = "https://dandiarchive.s3.amazonaws.com/zarr/ccefbc9f-30e7-4a4c-b044-5b59d300040b/"
with hdmf_zarr.NWBZarrIO(s3_path, mode='r') as read_io:
read_io.open()
assert isinstance(read_io.file.store, zarr.storage.ConsolidatedMetadataStore)
try:
# this fails:
nwb = read_io.read()
except Exception as exc:
print(repr(exc))
# ValueError: No data_type found for builder root
with hdmf_zarr.NWBZarrIO(s3_path, mode='-r') as read_io:
read_io.open()
assert isinstance(read_io.file.store, zarr.storage.FSStore)
try:
# this fails:
nwb = read_io.read()
except Exception as exc:
print(repr(exc))
# hdmf.backends.errors.UnsupportedOperation: Cannot build data. There are no values.
# the zarr file is empty:
z = zarr.open(s3_path, mode='r')
assert not tuple(z.keys())
@bjhardcastle I am talking about our roundtrip tests, such as https://github.com/hdmf-dev/hdmf-zarr/blob/8ca578733db5078d1ff6d2dfb47407a680c58caf/tests/unit/base_tests_zarrio.py#L331C9-L331C40
In hdmf-zarr our default is to consolidate metadata, the test passes to read a file with a zarr.storage.ConsolidatedMetadataStore.
Thanks for the code. I will look into this.
Is that writing to an S3 bucket though?
@bjhardcastle No, but my point is that locally ConsolidatedMetadata works. We don't have a read test and I couldn't reproduce it without a s3 file.
But you gave one, so this should be enough for me to reproduce the error and fix it. :)
Yes it works locally because is_remote()
returns False, which is correct.
When the file is remote, is_remote()
also returns False, because ConsolidatedMetadataStore
is not a subclass of FSStore
. This messes up the paths to the underlying data.
The file I gave you is the one from your tests - but it's empty. I'm trying to upload a non-empty zarr NWB file via the dandi CLI now, but following the docs hasn't been successful yet. If you have any working code to do that I'd appreciate it..
I have to reproduce the error to comment on your findings regarding is_remote(). I will see what I can do to help on the DANDI side
@bjhardcastle At any point did you export?
@bjhardcastle At any point did you export?
Export what?
@bjhardcastle At any point did you export?
Export what?
Did you export a file to get to your current file.
@bjhardcastle At any point did you export?
Export what?
Did you export a file to get to your current file.
Sorry I don't know what you mean. Could you be more specific?
I have to reproduce the error to comment on your findings regarding is_remote()
Can't you see that the code doesn't make sense? The type of zarr object is not an indicator of whether the data is remote or not.
As a result, lines like this make paths that are completely wrong: https://github.com/hdmf-dev/hdmf-zarr/blob/8ca578733db5078d1ff6d2dfb47407a680c58caf/src/hdmf_zarr/backend.py#L719-L721
That could be the case. I have not taken a deep look at what the issue is. I usually like to reproduce the error so that any changes I make to address that error can be tested.
In any case, I will find some time next week to look at this as I have higher priority issues I've already committed to. Let me know when you figure out uploading your file. I will look into this as well.
Within NWB you can edit files and export them into a new file. If you did not make the file and are unsure, that is fine. It could be helpful to know, and I would suggest finding out if possible.
I invite you to also review our Code of Conduct.
@bjhardcastle @mavaylon1 @alejoe91 thank you for the discussion to help us better understand the issue. We'll review the issue in our team meeting next week to make a plan to develop a fix for this issue and get back. I appreciate everyone's patients and hope you have a great weekend.
@mavaylon1 if this can help, I got the same error on an NWB file produced by a double export (first adding electrical series, second adding units) and when opening the file locally.
Here's a link with the file so you can download it: https://drive.google.com/drive/folders/1f8_92cXrEvOdJWJvghNVNR_sDjpejmEO?usp=sharing
Another update: switching the units export to an append mechanism seems to produce NWB files without the issue.
To summarize, we currently produce NWB files in multiple steps:
export
). After this step, the file can be read with no errors.export
, works with append
)Here are the NWB zarr files produced:
So it seems that the issue is with adding the Units table with export. I hope this helps!
Thanks you @alejoe91 for debugging this issue further and providing these additional details. We'll take a look.
@bjhardcastle we are actively investigating adjacent issues that may fix your ticket. Our goal is to either have a solution or a clear path to one by the end of next week.
@mavaylon1 any lead on this?
@alejoe91 @bjhardcastle The issue the is stalling this is our implementation of export. I would find out who generated the files/exported the files as the files in general might not be valid.
What happened?
185 highlights the need for a test for this, so maybe you're already aware of this issue.
The problem stems from
self.is_remote()
incorrectly returning False. AddingConsolidatedMetadataStore
to the type check here is enough to get it working again: https://github.com/hdmf-dev/hdmf-zarr/blob/afaab5fbca03a0979d9f234e13e43027f545c366/src/hdmf_zarr/backend.py#L185Steps to Reproduce
Traceback
Operating System
Windows
Python Executable
Python
Python Version
3.11
Package Versions
No response
Code of Conduct