sofwerx / cdb2-concept

CDB modernization
0 stars 1 forks source link

Attributes vs. Datatypes and Constraints #33

Open UnclePoole opened 4 years ago

UnclePoole commented 4 years ago

One open design question for an attribute data model is how to relate attributes with the datatypes they are implemented with.

A particular question is whether the datatypes should be constrained to just be the set of primitives (real, integer, boolean, text, codelist) or should be a more detailed datatype model similar to XSD where you can build up logical types from primitives via range constraints, measurement units, text patterns, and so on.

Many attributes may use the same logical constraints e.g. a text pattern for date strings or an angle real value constraint of [0, 360) with measurement unit degree.

However, attributes may have semantic range constraints that are stricter than the physical measurement unit range constraints - distance as a data type may have a range [0, inf) but the WID attribute would typically have a narrower range such as (0, 100].

This is one area where GGDM and NAS parent standards diverge.

GGDM flattens datatypes into attributes and each attribute bound to a particular entity type has the full constraint information duplicated at the binding rather than a separate datatype specification. So each distinct instance of WID has its own duplicate constraint information.

NAS has a separate datatype table with range, pattern, and measurement constraints and attributes are just used to bind datatypes to particular names, meaning the same datatype can be reused for many different attributes.

CDB 1.2 places constraints at the attribute level only and datatypes are only the primitives.

This gets particularly complex when dealing with enumerated qualitative attributes (codelists) where the list of valid vocabulary terms may vary both per attribute and per entity type (even NAS doesn't normalize duplicate terms across different attributes).

Thoughts?