hdmf-dev / hdmf-common-schema

Specifications for pre-defined data structures provided by HDMF.
Other
3 stars 8 forks source link

Either Data needs shape options or the default shape should be any shape #83

Open rly opened 3 months ago

rly commented 3 months ago

By default, a dataset's shape is null which means the attribute/dataset is a scalar. ref In object-oriented programming - specifically, the Liskov substitution principle - and our type system, subtypes (subclasses) should not expand the range of allowed options from the supertype (superclass).

The dataset data type Data has no shape defined, so its shape is scalar. Pretty much all subtypes of Data define a non-scalar shape. This breaks the above principle.

This had low impact because almost all subtypes of Data that are used define a shape (except for NWB's ScratchData) but as we are now trying to improve how the API resolves and validates shapes, the base shape matters.

Option 1: Add scalar, 1D, 2D, 3D, 4D, 5D shape options to Data.

Option 2: Make Data a special case where the default is any shape. This is kind of hacky.

Option 3: Change the default shape to be any shape for datasets. This could break existing schema that do not specify a shape and assume it is a scalar. I think other options are better.

rly commented 3 months ago

@oruebel curious your thoughts. I think there is value to having a special key that represents Any shape. We have added that to LinkML: https://linkml.io/linkml/schemas/arrays.html#any

oruebel commented 3 months ago

. I think there is value to having a special key that represents Any shape

That sounds reasonable

bendichter commented 3 months ago

What's wrong with just saying shape: null means Any?

bendichter commented 3 months ago

It seems to me the scalar should be the special case.

rly commented 3 months ago

It seems to me the scalar should be the special case.

I agree the scalar should be the special case. But this will modify the meaning of existing schema where a shape was not defined, e.g., https://github.com/NeurodataWithoutBorders/nwb-schema/blob/dev/core/nwb.file.yaml#L27

Thinking through this some more, I realize that this is not totally a breaking change. It just means more options (all options) will be valid when no shape is specified, including scalar. These schema would need to be updated to say to restrict the shape to a scalar using a new special value for scalar.

We version the language and are already preparing a 3.0.0 release so such a change would be OK.

I am happy to make scalar be the special case instead.

bendichter commented 3 months ago

I like having null mean any shape for a few reasons

rly commented 3 months ago

Sounds good. In LinkML, we also decided that shape: null = any shape. I was just worrying about compatibility. I'll make the change to shape: scalar