hdmf-dev / hdmf-schema-language

The specification language for HDMF
https://hdmf-schema-language.readthedocs.io
Other
0 stars 2 forks source link

Add best practices section #14

Open rly opened 3 years ago

rly commented 3 years ago

The schema language supports some flexibility in how data types are defined, and some methods are encouraged over others for clarity and consistency. These best practices should be added to the schema language documentation:

  1. Define new data types at the root of the schema rather than nested within another data type definition. Nested type definitions may in some cases lead to errors in HDMF. See https://github.com/hdmf-dev/hdmf/issues/511, https://github.com/hdmf-dev/hdmf/issues/316, and https://github.com/hdmf-dev/hdmf/issues/73
  2. Use the quantity key not in the data type definition but in the group/dataset spec where the type is included. When the data type is defined at the root of the schema (as opposed to nested), then in order to use the data type, a new group (subgroup) spec is defined where the quantity key is set to a value or if omitted, the default value of 1 would be used. This makes the quantity defined in the data type definition meaningless and confusing. See also https://github.com/NeurodataWithoutBorders/nwb-schema/pull/472
  3. Use the name key not in the data type definition but iin the group/dataset spec where the type is included. Mismatch between the name defined on the data type definition and where it is included can lead to confusion in the expected behavior and may lead to errors in HDMF. See https://github.com/hdmf-dev/hdmf/issues/582
  4. Create a new data type when adding attributes/datasets/groups/links to an existing data type. See #13. -- Make this a rule (stop allowing new specs with this)
  5. Modifying the dtype, shape, or quantity of a data type when using data_type_inc should only restrict the values from their original definitions. For example, if type A has dtype: text and type B extends type A (data_type_def: B, data_type_inc: A), then type B should not redefine dtype to be int which is incompatible with the dtype of type A. Same thing if type A is included and a new type is not defined (just data_type_inc: A). In other words, all children types should be valid against the parent type. This is not yet checked in HDMF but see progress in https://github.com/hdmf-dev/hdmf/pull/321 .
  6. Non-scalar values for the value and default_value keys are not yet supported by the official APIs, so these are discouraged until support is added.
  7. Don'e allow spaces in any names. https://github.com/NeurodataWithoutBorders/pynwb/issues/1421

@bendichter @oruebel @ajtritt Can you think of other best practices to add? Do you agree with the above?

rly commented 3 years ago

Question: How to validate against this case:

data_type_def: SpecialVectorData
data_type_inc: VectorData
dtype: text

data_type_def: DynamicTable
datasets:
- data_type_inc: VectorData  (validate against this as anonymous spec DynamicTable__VectorData)
  name: spike_times
  dtype: int
  shape: [None]

# question: should this be allowed?
# and what if SpecialVectorData does not specify a dtype?
MyTableBuilder
data type: DynamicTable
datasets:
- data type: SpecialVectorData
  name: spike_times
oruebel commented 3 years ago

See also https://github.com/hdmf-dev/hdmf-schema-language/issues/13#issuecomment-824249427 for additional best practices