overture-stack / lectern

Data Schema / Dictionary management system
GNU Affero General Public License v3.0
0 stars 1 forks source link

Refactor: Schema Definition in TypeScript with Zod #181

Closed joneubank closed 1 year ago

joneubank commented 1 year ago

Purpose of this Refactor: Improve Types

This work started as an attempt to improve the type system usage throughout the repository, which revealed an opportunity to improve the method used to define the Dictionary Meta-Schema.

The original planned type system improvements were to:

  1. Set .tsconfig.json to use strict: true (and fix the errors this causes).
  2. Define in TS the Dictionary and Schema types.
  3. Remove all uses of any types where the internal types were known.
  4. Remove all unsafe or incorrect type casts throughout.

Working on item 2 - writing TS types for the Dictionary Meta-Schema - revealed a couple things about the previous setup:

  1. Keeping internal types consistent with changes to the Meta-Schema JSON would be an ongoing maintenance challenge.
  2. There were validations being done in code that were not reflected by the Meta-Schema.
  3. The validations were not being done in the same place, they were scattered through different steps in the code base.
  4. There were validations that were needed that were not being done.
  5. There were validations that were needed that could not be reflected in a JSON-Schema defined Meta-Schema.

Along with this information it is important to highlight that there are tools available to TypeScript in 2023 that were not available when Lectern was first written. Specifically, we now have zod, a TypeScript first schema definition library that is becoming an industry standard. Zod can resolve all of the issues highlighted above, including the two most important changes (in my opinion):

  1. Meta-Schema and TS Type information will always be in sync
  2. All schema validations can be written in a single logical place

This is important for code maintenance (no inconsistency between JSON schema and TS). It lets developers take advantage of the TS type system in their coding, for the benefits in reliability and code feedback that it provides. This will also standardize how Dictionary validation is performed, putting the content rules adjacent to the structure definition and ensuring both are run in the same way.

Changes in This PR

There are a lot of files touched in this PR but the important changes are isolated to a few. The long list of files changed is the result of related changes to imports, small type fixes, and some moving of files around.

The important changes are:

  1. /src/types/dictionaryTypes.ts - Zod Schema definitions, and exports of types for Dictionary, Schema, fields, restrictions, and all of the composite types.
  2. Removal of MetaSchema.json - Replaced by dictionaryTypes.ts.
  3. Moving DB Model code (mongoose schema) into the /src/db/dictionaryModel.ts file.
  4. Moving express route handlers to dedicated router files in /src/routers.
  5. Moving client API code into the directory /src/external - This includes ego and vault integration code.
  6. Adding request validation to /src/controllers/dictionaryController.ts.

Some other updates performed:

  1. Tests were updated to used new type system, some test fixtures were invalid and needed updating.
  2. Directory added to share sample Dictionaries and Schemas, two basic examples have been provided to start with.
  3. package.json has been updated to be version 2+, using the pre-release version tag next.
  4. Copyright notice year was updated to 2023 for consistency.
  5. Organized imports in all /src files

Bugs Fixed

joneubank commented 1 year ago

Not mentioned in this PR description - We can generate a JSON Schema formatted Meta-Schema for the Lectern Dictionary structure so that a language agnostic schema is available to validate against. We should add a piece of automation to the build step that generates this Meta-Schema file from the Zod Schemas.

It would also be valuable to add a version of the create dictionary, add schema, and update schema methods that work as a validator without updating the stored dictionaries. These endpoints would output the dictionary as it would look after updates were made, but nothing would be committed to the DB. It would be best to put these on a URI path separate from the endpoints that update the DB.