overture-stack / lectern

Data Schema / Dictionary management system
GNU Affero General Public License v3.0
0 stars 1 forks source link

Feature Request: uniqueKey restriction for schema #176

Closed joneubank closed 1 year ago

joneubank commented 1 year ago

A unique key restriction can be added to a schema that lists a set of properties that together must be unique. This is similar to Primary Key constraint in DBs.

Detailed Description

Each schema can have a property that will be a list of field names from that schema. These properties will serve as a combined unique key for the record. Lectern validation clients should ensure, when validating a data-set for this schema, that no two records in the data-set have all properties from the unique key restriction being the same.

Potential Risks

Lectern client implementations will need to be provided the entire data-set to enforce uniqueness. In applications where data is validated in chunks and added to a larger database, it might require custom implementations to read the entire data-set from the database to ensure the restriction is met. Still, marking this requirement on the schema so that clients can enforce this rule is useful.

Possible Implementation

Update the lectern meta-schema so that each schema object can have an optional object called restrictions. Like with field restrictions, this object cannot be empty.

Add to this restrictions object an optional property uniqueKey. The value must be a list of strings. Each value in this array must match a field within the schema - this rule may not be enforceable through the schema, so validation will need to be added in code.

A valid schema matching this meta-schema would look like:

{
  "name": "schema_name",
  "description": "Example Schema with a unique ID field",
  "restrictions": { // <===== New restrictions block
    "uniqueKey": ["submitter_id", "study_id"] // <===== unique key restriction
  },
  "fields": [
    {
      "name": "submitter_id",
      "valueType": "string",
      "description": "Unique identifier provided by submitter",
      "restrictions": {
        "required": true,
        "regex": "^[A-Z]{4,10}$"
      }
    },
    {
      "name": "study_id",
      "valueType": "string",
      "description": "Study the submitted data belongs to",
      "restrictions": {
        "required": true
      }
    },
    // ...more fields here
  ]
}

A note on missing values

It is possible for a field that is part of a unique key to not have the required restriction. If this occurs, then a missing value is treated as one unique value for that field.

Comparison to field level unique restriction

https://github.com/overture-stack/lectern/issues/175 The field level restriction unique does not conflict with this property. Validations for the schema's uniqueKey restriction. can be applied distinctly form the field level unique restriction. Lectern will not enforce that a field within the uniqueKey list is not also restrcited to be unique; both rules will be tested separately.

joneubank commented 1 year ago

With rules being added to schemas, we will need to update the diff checker to report schema level changes not on their fields.

This may have consequences on the output format.

This applies to #177 as well.