Closed joneubank closed 1 year ago
I am considering changing this requirement to allow different behaviour for missing values. It might be desirable to only enforce uniqueness on records that have a value in that field, allowing any number of missing values.
In this case we should change the restriction from true/false to having modes:
"unique": "all"
- no value more than once, only one missing value allowed
"unique": "values"
- all provided values must be unique, any number of missing values are allowed
if unique restriction is not provided then no uniqueness enforced
"unique": true
- all values unique, only one missing value allowed
"uniqueOrMissing": true
- all provided values must be unique, any number of missing values are allowed
If both restrictions are set the more strict unique
restriction will be enforced.
Last edit about missing values:
It seems most reasonable to not enforce uniqueness on missing values. If the field is not required, and a value is not provided, then the uniqueness restriction is not considered. Same way code_list and regex would be handled.
This gives the original proposal (always strict), and three alternatives to consider. Will confirm implementation in a comment after team review.
After discussion it is agreed that the standard approach to monitoring uniqueness is that only fields with values will be considered when applying the uniqueness restriction. This means that any number of records with undefined
values are allowed, but if the field has a value then it must be unique. To ensure every record is unique, the field should also have the required
restriction set to true
.
This matches Alternative 3
.
Add a field restriction that will declare each value of the field must be unique.
Detailed Description
Add a field level restriction that specifies all values of this field must be unique. This specifies that when a data-set is validated against this schema that each record will be checked to ensure the same value does not occur for this field more than once within the data set.
Potential Risks
Lectern client implementations will need to be provided the entire data-set to enforce uniqueness. In applications where data is validated in chunks and added to a larger datbase, it might require custom implementations to read the entire data-set from the database to ensure the restriction is met. Still, marking this requirement on the schema so that clients can enforce this rule is a useful.
Possible Implementation
Add to the meta schema a new restriction on a schema field called
unique
that will have a boolean valuetrue
orfalse
, similar to required.A valid schema matching this meta-schema would look like:
A note on missing values
It is possible for a field with
unique
restriction to not have therequired
restriction. If this occurs, then a missing value is treated as one unique value for that field: only one record can have a missing value.