hdmf-dev / hdmf-common-schema

Specifications for pre-defined data structures provided by HDMF.
Other
3 stars 7 forks source link

add EnumData #51

Closed ajtritt closed 3 years ago

ajtritt commented 3 years ago

Summary of changes

PR checklist for schema changes

rly commented 3 years ago

This looks good to me. Please update the release notes when ready.

rly commented 3 years ago

I wonder about naming it "EnumData" though. To me, this type should be VocabData and the other type should be EnumData because I think controlled vocabularies are typically large and enumerated types are typically small.

oruebel commented 3 years ago

How about "LargeVocabData" ?

rly commented 3 years ago

I approve of LargeVocabData

ajtritt commented 3 years ago

I would like to get rid of VocabData eventually to avoid redundancy, at which point the "LargeVocabData" label might not make sense. Furthermore, EnumData indicates that it is not fixed to just "vocabularies". Enum might not be the right name for a general solution, but I can't think of another word for "data that comes from a fixed set of values".

rly commented 3 years ago

If VocabData will eventually be removed, can we replace it now with this new type and bump the major version number on the schema and HDMF?

rly commented 3 years ago

Furthermore, EnumData indicates that it is not fixed to just "vocabularies". Enum might not be the right name for a general solution, but I can't think of another word for "data that comes from a fixed set of values".

Sorry, I had misunderstood this previously. I think VocabData represents "data that comes from a fixed set of values" better than EnumData.

Enumerated types in many programming languages allow custom values to be associated with the name/key, e.g., Color = {RED = (1, 0, 0), GREEN = (0, 1, 0)}. Color.RED then acts as an immutable constant. hdmf-common-schema allows customizing the index for a particular name (and I think the API does not allow customizing this). So EnumData may imply more functionality than it supports.

See also our previous discussion on naming this type here: https://github.com/hdmf-dev/hdmf-common-schema/issues/29

FixedValuesData could also work but sounds clunky.

ajtritt commented 3 years ago

This concept is nothing new to the data world, so I'd prefer not to invent some new name for the sake of being explicit. ML people might call this "vocabulary data". They might call it "categorical data". A computer scientist would call it "enumerated data"

We have 3 choices. If we choose one of these three, I am confident that the level of confusion experienced by users will be negligible in comparison to what this discussion suggests we are anticipating.

rly commented 3 years ago

Good point. It will be pretty clear either way. I also forgot that we are no longer restricting the fixed values to be strings, and VocabData kind of implies string terms, so I take back my preference for that. I like both CategoricalData and EnumData. Your choice!

oruebel commented 3 years ago

like both CategoricalData and EnumData. Your choice!

I like CategoricalData a bit better because I think it may be more intuitive for non-CS users, but either term will work.