Open rcpeene opened 1 month ago
Thanks for including the code and traceback. The issue appears to be due to some conversion between data types when exporting from Zarr to HDF5:
ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original dtype
This error originates from here in the HDMF library when writing to disk:
File "/opt/conda/lib/python3.9/site-packages/hdmf/backends/hdf5/h5tools.py", line 1490, in __list_fill__ dset[:] = data
I can't share the nwb file for licensing reasons
Since you can't share the original data file, we'll probably need your help to get to the root of this.
Option 1 would be, if you could share a "dummy" file that has the same issue, then we could investigate, i.e., we don't really need the real data to debug, but some some file that looks similar and raises the error should be fine.
Option 2 is to do a bit more retracing of steps on your end so we can at least figure out what case causes it so that we can reproduce the issue on our end. A first step here would be to output all the properties of the dataset and data when the exception occurs, e.g., by adding a print statement here before the exception is being raised in line 1492, something along the lines of print("parent", parent, "\n", "name", name, "\n", "dset", dset, "\n", "dset.dtype", dset.dtype, "\n" , "data.dtype", data.dtype, "\n" "data", data)
. So that we can see what data types are being converted.
@oruebel I've received permission to share the file directly with you for examination as long as it isn't distributed. Would a onedrive link work?
I've received permission to share the file directly with you for examination as long as it isn't distributed. Would a onedrive link work?
Sure, a onedrive link should be fine. Feel free to send via Slack or email oruebel@lbl.gov so we can take a look. We'll treat the data confidentially and not share with others.
invite email sent
Any updates here? It's one of the last things holding up our data pipeline.
As far as I can tell, the issue seems to occur when copying /intervals/flash_block_presentations/tags
. My guess is that this is that this is likely due the following:
I'll need to do a bit more digging to confirm. My guess is that the fix will likely need to be in HDMF. A possible workaround may be to wrap /intervals/flash_block_presentations/tags
with H5DataIO
before calling export to explicitly set the dtype, but I have not tested this yet.
What is confusing to me is that when printing from HDF5IO
it shows <zarr.core.Array '/intervals/flash_block_presentations/tags' (1011,) <U0 read-only>
but when opening the file with Zarr manually it shows <zarr.core.Array '/intervals/flash_block_presentations/tags' (1011,) object read-only>
but I'm not sure why the dtype would be <U0
instead of object
. It looks that because of this, it is actually reading the data from Zarr itself that is failing.
It appears the issue is that ObjectMapper
in HDMF uses .astype('U')
to enforce that the dtype
of the dataset is unicode as specified in the schema. For Zarr datasets this fails because Zarr does not support 'U' as a dtype for variable length string.
I submitted a PR on HDMF here https://github.com/hdmf-dev/hdmf/pull/1171 for this. With this change I was able to convert the file to HDF5.
What happened?
Trying to export a zarr nwb as hdmf, but it yields an error
Steps to Reproduce
I can't share the nwb file for licensing reasons
Operating System
Linux
Python Executable
Python
Python Version
3.9
Package Versions
No response
Code of Conduct