hdmf-dev / hdmf-zarr

Zarr I/O backend for HDMF
https://hdmf-zarr.readthedocs.io/
Other
7 stars 6 forks source link

[Bug]: ValueError when using `export` function (for `subject.date_of_birth`) #160

Closed alejoe91 closed 7 months ago

alejoe91 commented 7 months ago

What happened?

Hi guys,

Weird bug when tryong to export an NWB-zarr file to another one, see steps to reproduce below.

The dataset that is triggering the error is the subject.date_of_birth, which is wrongly assigned and int dtype.

Steps to Reproduce

from datetime import datetime
from hdmf_zarr import NWBZarrIO
from pynwb.file import Subject, NWBFile

subject = Subject(
    subject_id="001",
    species="Mus musculus",
    sex="M",
    date_of_birth=datetime.now(),
    age="P1D",
    description=None,
)

nwbfile = NWBFile(
    session_description="Test File",
    identifier="0000",
    session_start_time=datetime.now(),
    subject=subject,
)

with NWBZarrIO("test_subject.nwb", "w") as io:
    io.write(nwbfile)

# export
with NWBZarrIO("test_subject.nwb", "r") as read_io:
    nwbfile = read_io.read()
    with NWBZarrIO("test_subject_export.nwb", "w") as export_io:
        export_io.export(src_io=read_io, nwbfile=nwbfile)

Traceback

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[23], line 4
      2 nwbfile = read_io.read()
      3 with NWBZarrIO("test_subject_export.nwb", "w") as export_io:
----> 4     export_io.export(src_io=read_io, nwbfile=nwbfile)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/nwb.py:67, in NWBZarrIO.export(self, **kwargs)
     65 nwbfile = popargs('nwbfile', kwargs)
     66 kwargs['container'] = nwbfile
---> 67 super().export(**kwargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/backend.py:353, in ZarrIO.export(self, **kwargs)
    351 ckwargs = kwargs.copy()
    352 ckwargs['write_args'] = write_args
--> 353 super().export(**ckwargs)
    354 if cache_spec:
    355     self.__cache_spec()

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/backends/io.py:165, in HDMFIO.export(self, **kwargs)
    163 else:
    164     bldr = src_io.read_builder()
--> 165 self.write_builder(builder=bldr, **write_args)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/backend.py:432, in ZarrIO.write_builder(self, **kwargs)
    428 f_builder, link_data, exhaust_dci, export_source, consolidate_metadata = getargs(
    429     'builder', 'link_data', 'exhaust_dci', 'export_source', 'consolidate_metadata', kwargs
    430 )
    431 for name, gbldr in f_builder.groups.items():
--> 432     self.write_group(
    433         parent=self.__file,
    434         builder=gbldr,
    435         link_data=link_data,
    436         exhaust_dci=exhaust_dci,
    437         export_source=export_source,
    438     )
    439 for name, dbldr in f_builder.datasets.items():
    440     self.write_dataset(
    441         parent=self.__file,
    442         builder=dbldr,
   (...)
    445         export_source=export_source,
    446     )

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/backend.py:522, in ZarrIO.write_group(self, **kwargs)
    520 if subgroups:
    521     for subgroup_name, sub_builder in subgroups.items():
--> 522         self.write_group(
    523             parent=group,
    524             builder=sub_builder,
    525             link_data=link_data,
    526             exhaust_dci=exhaust_dci,
    527         )
    529 datasets = builder.datasets
    530 if datasets:

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/backend.py:532, in ZarrIO.write_group(self, **kwargs)
    530 if datasets:
    531     for dset_name, sub_builder in datasets.items():
--> 532         self.write_dataset(
    533             parent=group,
    534             builder=sub_builder,
    535             link_data=link_data,
    536             exhaust_dci=exhaust_dci,
    537             export_source=export_source,
    538         )
    540 # write all links (haven implemented)
    541 links = builder.links

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf/utils.py:664, in docval.<locals>.dec.<locals>.func_call(*args, **kwargs)
    662 def func_call(*args, **kwargs):
    663     pargs = _check_args(args, kwargs)
--> 664     return func(args[0], **pargs)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/backend.py:1064, in ZarrIO.write_dataset(self, **kwargs)
   1061 # write a 'regular' dataset without DatasetIO info
   1062 else:
   1063     if isinstance(data, (str, bytes)):
-> 1064         dset = self.__scalar_fill__(parent, name, data, options)
   1065     # Iterative write of a data chunk iterator
   1066     elif isinstance(data, AbstractDataChunkIterator):

File ~/anaconda3/envs/si/lib/python3.10/site-packages/hdmf_zarr/backend.py:1264, in ZarrIO.__scalar_fill__(self, parent, name, data, options)
   1261     io_settings['object_codec'] = self.__codec_cls()
   1263 dset = parent.require_dataset(name, shape=(1, ), dtype=dtype, **io_settings)
-> 1264 dset[:] = data
   1265 type_str = 'scalar'
   1266 dset.attrs['zarr_dtype'] = type_str

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:1497, in Array.__setitem__(self, selection, value)
   1495     self.set_orthogonal_selection(pure_selection, value, fields=fields)
   1496 else:
-> 1497     self.set_basic_selection(pure_selection, value, fields=fields)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:1593, in Array.set_basic_selection(self, selection, value, fields)
   1591     return self._set_basic_selection_zd(selection, value, fields=fields)
   1592 else:
-> 1593     return self._set_basic_selection_nd(selection, value, fields=fields)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:1983, in Array._set_basic_selection_nd(self, selection, value, fields)
   1977 def _set_basic_selection_nd(self, selection, value, fields=None):
   1978     # implementation of __setitem__ for array with at least one dimension
   1979 
   1980     # setup indexer
   1981     indexer = BasicIndexer(selection, self)
-> 1983     self._set_selection(indexer, value, fields=fields)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:2038, in Array._set_selection(self, indexer, value, fields)
   2035                 chunk_value = chunk_value[item]
   2037         # put data
-> 2038         self._chunk_setitem(chunk_coords, chunk_selection, chunk_value, fields=fields)
   2039 else:
   2040     lchunk_coords, lchunk_selection, lout_selection = zip(*indexer)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:2304, in Array._chunk_setitem(self, chunk_coords, chunk_selection, value, fields)
   2301     lock = self._synchronizer[ckey]
   2303 with lock:
-> 2304     self._chunk_setitem_nosync(chunk_coords, chunk_selection, value, fields=fields)

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:2308, in Array._chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields)
   2306 def _chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields=None):
   2307     ckey = self._chunk_key(chunk_coords)
-> 2308     cdata = self._process_for_setitem(ckey, chunk_selection, value, fields=fields)
   2310     # attempt to delete chunk if it only contains the fill value
   2311     if (not self.write_empty_chunks) and all_equal(self.fill_value, cdata):

File ~/anaconda3/envs/si/lib/python3.10/site-packages/zarr/core.py:2329, in Array._process_for_setitem(self, ckey, chunk_selection, value, fields)
   2323 if is_scalar(value, self._dtype):
   2324 
   2325     # setup array filled with value
   2326     chunk = np.empty_like(
   2327         self._meta_array, shape=self._chunks, dtype=self._dtype, order=self._order
   2328     )
-> 2329     chunk.fill(value)
   2331 else:
   2332 
   2333     # ensure array is contiguous
   2334     chunk = value.astype(self._dtype, order=self._order, copy=False)

ValueError: invalid literal for int() with base 10: b'2024-02-12T17:20:40.860039+01:00'

Operating System

Linux

Python Executable

Conda

Python Version

3.10

Package Versions

hdmf-zarr 0.5.0 pynwb 2.5.0

Code of Conduct

oruebel commented 7 months ago

Thanks for the issue. Are you planning to dig into this or do you need me to take a look at this later this week?

I'm a bit surprised this is not being caught by our conversion tests.

https://github.com/hdmf-dev/hdmf-zarr/blob/300070ba8be3ebf5525defcd4a3f3339a36b3272/tests/unit/test_io_convert.py#L271

So step 1 seems to be to add the example you provided as a unit test case. I would have expected that at the very least the tutorial for converting should encounter this.

https://github.com/hdmf-dev/hdmf-zarr/blob/300070ba8be3ebf5525defcd4a3f3339a36b3272/docs/gallery/plot_convert_nwb_hdf5.py#L108-L112

alejoe91 commented 7 months ago

Hi @oruebel

It'd be great if you can take a look. Probably the test files don't have the subject defined!

oruebel commented 7 months ago

I started #161 to dig into this. At this point, I can confirm that I can reproduce the problem and I've added a unit test. It looks like the issue may be specific to export from Zarr to Zarr. Converting from HDF5 to Zarr and Zarr to HDF5 at least does not raise the same error.

oruebel commented 7 months ago

@alejoe91 can you please review #161