hdmf-dev / hdmf

The Hierarchical Data Modeling Framework
http://hdmf.readthedocs.io
Other
46 stars 24 forks source link

[Bug]: Support writing multidimensional string datasets and attributes #1096

Open rly opened 2 months ago

rly commented 2 months ago

What happened?

In trying to resolve https://github.com/NeurodataWithoutBorders/pynwb/pull/1886/ in HDMF, I discovered that on build, a multidimensional list or array that maps to a multidimensional dataset/attribute spec is converted to a 1-D list or array.

[['a', 'b'], ['c', 'd']] -> ["['a', 'b']", "['c', 'd']"] np.array([['a', 'b'], ['c', 'd']]) -> array(["['a' 'b']", "['c' 'd']"], dtype='<U9')

Steps to Reproduce

def test_build_2d_lol(self):
        bar_spec = GroupSpec(
            doc='A test group specification with a data type',
            data_type_def='Bar',
            datasets=[
                DatasetSpec(
                    doc='an example dataset',
                    dtype='text',
                    name='data',
                    shape=(None, None),
                    attributes=[AttributeSpec(name='attr2', doc='an example integer attribute', dtype='int')],
                )
            ],
            attributes=[AttributeSpec(name='attr1', doc='an example string attribute', dtype='text')],
        )
        type_map = self.customSetUp(bar_spec)
        type_map.register_map(Bar, BarMapper)
        bar_inst = Bar('my_bar', [['a', 'b'], ['c', 'd']], 'value1', 10)
        builder = type_map.build(bar_inst)
        self.assertEqual(builder.get('data').data, [['a', 'b'], ['c', 'd']])

    def test_build_2d_ndarray(self):
        bar_spec = GroupSpec(
            doc='A test group specification with a data type',
            data_type_def='Bar',
            datasets=[
                DatasetSpec(
                    doc='an example dataset',
                    dtype='text',
                    name='data',
                    shape=(None, None),
                    attributes=[AttributeSpec(name='attr2', doc='an example integer attribute', dtype='int')],
                )
            ],
            attributes=[AttributeSpec(name='attr1', doc='an example string attribute', dtype='text')],
        )
        type_map = self.customSetUp(bar_spec)
        type_map.register_map(Bar, BarMapper)
        bar_inst = Bar('my_bar', np.array([['a', 'b'], ['c', 'd']]), 'value1', 10)
        builder = type_map.build(bar_inst)
        np.testing.assert_array_equal(builder.get('data').data, np.array([['a', 'b'], ['c', 'd']]))

Traceback

No response

Operating System

macOS

Python Executable

Conda

Python Version

3.12

Package Versions

No response

rly commented 2 months ago

I'm working on a fix already.