nested type definitions

There are two ways to define a new Container type in the spec language. You can either define a Container type in the /groups of the schema with a _def field (and optionally an _inc) and then use it in another place with an _inc field (and no _def field), or you can define it and use it at the same time by including _def and _inc fields in a nested spot of the schema. Most types are defined in /groups and /datasets and included separately, but there are cases in the NWB:N core schema where types are defined as they are used, e.g.

https://github.com/NeurodataWithoutBorders/pynwb/blob/a081b39803f0ffb535b2198a2fdd81007ffa442e/src/pynwb/data/nwb.file.yaml#L234-L243

As far as I can tell, this capability is purely for convenience and offers no additional features or expressive power for the defined types. Currently our get_class (i.e. type_map.get_container_cls in hdmf) function only works at the surface level (in '/groups') and will not register nested Container definitions. As a result, there are schemas that follow the schema language rules, but for which the API auto-generation will not work. I see a few possible solutions.

Here is a minimal example. The last 4 commands are the critical piece.

import os
import unittest2 as unittest
import tempfile
import warnings
import numpy as np

from hdmf.utils import docval, getargs
from hdmf.data_utils import DataChunkIterator
from hdmf.backends.hdf5.h5tools import HDF5IO
from hdmf.backends.hdf5 import H5DataIO
from hdmf.build import DatasetBuilder, BuildManager, TypeMap, ObjectMapper
from hdmf.spec.namespace import NamespaceCatalog
from hdmf.spec.spec import AttributeSpec, DatasetSpec, GroupSpec, ZERO_OR_MANY, ONE_OR_MANY
from hdmf.spec.namespace import SpecNamespace
from hdmf.spec.catalog import SpecCatalog
from hdmf.container import Container
from h5py import SoftLink, HardLink, ExternalLink, File

from tests.unit.test_utils import Foo, FooBucket, CORE_NAMESPACE

foo_spec = GroupSpec('A test group specification with a data type',
                     data_type_def='Foo',
                     datasets=[DatasetSpec('an example dataset',
                                           'int',
                                           name='my_data',
                                           attributes=[AttributeSpec('attr2',
                                                                     'an example integer attribute',
                                                                     'int')])],
                     attributes=[AttributeSpec('attr1', 'an example string attribute', 'text')])

tmp_spec = GroupSpec('A subgroup for Foos',
                     name='foo_holder',
                     groups=[GroupSpec('the Foos in this bucket', data_type_inc='Foo', quantity=ZERO_OR_MANY)])

bucket_spec = GroupSpec('A test group specification for a data type containing data type',
                        data_type_def='FooBucket',
                        groups=[tmp_spec])

class BucketMapper(ObjectMapper):
    def __init__(self, spec):
        super(BucketMapper, self).__init__(spec)
        foo_spec = spec.get_group('foo_holder').get_data_type('Foo')
        self.map_spec('foos', foo_spec)

file_spec = GroupSpec("A file of Foos contained in FooBuckets",
                      name='root',
                      data_type_def='FooFile',
                      groups=[GroupSpec('Holds the FooBuckets',
                                        name='buckets',
                                        groups=[GroupSpec("One ore more FooBuckets",
                                                          data_type_inc='FooBucket',
                                                          quantity=ONE_OR_MANY)])])

class FileMapper(ObjectMapper):
    def __init__(self, spec):
        super(FileMapper, self).__init__(spec)
        bucket_spec = spec.get_group('buckets').get_data_type('FooBucket')
        self.map_spec('buckets', bucket_spec)

spec_catalog = SpecCatalog()
spec_catalog.register_spec(foo_spec, 'test.yaml')
spec_catalog.register_spec(bucket_spec, 'test.yaml')
spec_catalog.register_spec(file_spec, 'test.yaml')
namespace = SpecNamespace(
    'a test namespace',
    CORE_NAMESPACE,
    [{'source': 'test.yaml'}],
    catalog=spec_catalog)
namespace_catalog = NamespaceCatalog()
namespace_catalog.add_namespace(CORE_NAMESPACE, namespace)
type_map = TypeMap(namespace_catalog)

type_map.register_container_type(CORE_NAMESPACE, 'Foo', Foo)
type_map.register_container_type(CORE_NAMESPACE, 'FooBucket', FooBucket)
type_map.register_container_type(CORE_NAMESPACE, 'FooFile', FooFile)

type_map.register_map(FooBucket, BucketMapper)
type_map.register_map(FooFile, FileMapper)

manager = BuildManager(type_map)

spec_catalog = manager.namespace_catalog.get_namespace(CORE_NAMESPACE).catalog
foo_spec = spec_catalog.get_spec('Foo')
# Baz1 class contains an object of Baz2 class
baz_spec2 = GroupSpec('A composition inside',
                      data_type_def='Baz2',
                      data_type_inc=foo_spec,
                      attributes=[
                          AttributeSpec('attr3', 'an example float attribute', 'float'),
                          AttributeSpec('attr4', 'another example float attribute', 'float')])

baz_spec1 = GroupSpec('A composition test outside',
                      data_type_def='Baz1',
                      data_type_inc=foo_spec,
                      attributes=[AttributeSpec('attr3', 'an example float attribute', 'float'),
                                  AttributeSpec('attr4', 'another example float attribute', 'float')],
                      groups=[baz_spec2])

# add directly into the existing spec_catalog. would not do this normally.
spec_catalog.register_spec(baz_spec1, 'test.yaml')

Baz2 = manager.type_map.get_container_cls(CORE_NAMESPACE, 'Baz2')

1) Inform users that if they want to auto-generate the API, all type definitions must be in /groups and /datasets. This is a limitation only in style and not in function. The biggest downside here is that the rules used for writing extensions will be slightly more strict than those for the core (again, only stylistically).

2) You could see this freedom of the schema language as a bug, since it provides multiple right ways to do something without offering any benefit. We could change the schema language rules so that you can only define types at the surface and change the schema to match. This should not cause any compatibility issues for the core, but would raise some eyebrows. If we are going to to this we should do it now before community extensions start to accumulate.

3) We could build logic to search the tree for new types.

Thoughts?

Checklist

[x] Have you ensured the feature or change was not already reported ?
[x] Have you included a brief and descriptive title?
[x] Have you included a clear description of the problem you are trying to solve?
[x] Have you included a minimal code snippet that reproduces the issue you are encountering?
[x] Have you checked our Contributing document?

hdmf-dev / hdmf

nested type definitions #73

Checklist