Closed ajtritt closed 3 years ago
This looks good to me. Please update the release notes when ready.
I wonder about naming it "EnumData" though. To me, this type should be VocabData
and the other type should be EnumData
because I think controlled vocabularies are typically large and enumerated types are typically small.
How about "LargeVocabData" ?
I approve of LargeVocabData
I would like to get rid of VocabData eventually to avoid redundancy, at which point the "LargeVocabData" label might not make sense. Furthermore, EnumData indicates that it is not fixed to just "vocabularies". Enum might not be the right name for a general solution, but I can't think of another word for "data that comes from a fixed set of values".
If VocabData will eventually be removed, can we replace it now with this new type and bump the major version number on the schema and HDMF?
Furthermore, EnumData indicates that it is not fixed to just "vocabularies". Enum might not be the right name for a general solution, but I can't think of another word for "data that comes from a fixed set of values".
Sorry, I had misunderstood this previously. I think VocabData
represents "data that comes from a fixed set of values" better than EnumData
.
Enumerated types in many programming languages allow custom values to be associated with the name/key, e.g., Color = {RED = (1, 0, 0), GREEN = (0, 1, 0)}
. Color.RED then acts as an immutable constant. hdmf-common-schema allows customizing the index for a particular name (and I think the API does not allow customizing this). So EnumData
may imply more functionality than it supports.
See also our previous discussion on naming this type here: https://github.com/hdmf-dev/hdmf-common-schema/issues/29
FixedValuesData
could also work but sounds clunky.
This concept is nothing new to the data world, so I'd prefer not to invent some new name for the sake of being explicit. ML people might call this "vocabulary data". They might call it "categorical data". A computer scientist would call it "enumerated data"
We have 3 choices. If we choose one of these three, I am confident that the level of confusion experienced by users will be negligible in comparison to what this discussion suggests we are anticipating.
Good point. It will be pretty clear either way. I also forgot that we are no longer restricting the fixed values to be strings, and VocabData
kind of implies string terms, so I take back my preference for that. I like both CategoricalData
and EnumData
. Your choice!
like both
CategoricalData
andEnumData
. Your choice!
I like CategoricalData
a bit better because I think it may be more intuitive for non-CS users, but either term will work.
Summary of changes
PR checklist for schema changes
docs/source/conf.py
andcommon/namespace.yaml
to the next version with the suffix "-alpha"docs/source/format_release_notes.rst