sinclairzx81 / typebox

Json Schema Type Builder with Static Type Resolution for TypeScript
Other
4.56k stars 148 forks source link

AOT compilation using custom registered types #869

Closed jcmoore closed 2 months ago

jcmoore commented 2 months ago

I have registered a type (which I'd love native support for -- particularly for its flexibility beyond Uint8Array):

TypeRegistry.Set('DataView', (_schema, value) => {
  return ArrayBuffer.isView(value) && !("length" in value);
});

This seems to fulfill my needs at runtime, but looking ahead to AOT compilation, it seems like TypeCompiler.Code() may not preserve much of the TSchema for custom kinds to inform registered checkers.

For the specific use-case above, I don't actually use the _schema argument, so it can probably be worked around just by dependency-injecting a kind() method that knows how to handle "DataView" types, but what would be the right approach for TSchemas that actually have some significant attributes (like schema.minByteLength/schema.maxByteLength)?

Am I misunderstanding something about how AOT compilation is intended to work? Would you consider extending the TypeCompiler (or TypeRegistry) API with something like the following (probably to be used in the TypeCompiler.Visit() method)?

export namespace TypeCompiler {
  export function SetGenerator(kind: string, codegen: CodeGenerator);
  export function GetGenerator(kind: string): CodeGenerator;
}

type CodeGenerator = (schema: TSchema, references: TSchema[], value: string) => string;

Happy to submit a PR -- thanks for your work on this rad library!

sinclairzx81 commented 2 months ago

@jcmoore Hi!

At this point in time, AOT compilation will require you to embed 3 auxiliary function calls into the output (if you're using custom types and formats). TypeBox will emit calls to these functions to perform exterior value checks. The following provides some high level information on the current design.


Custom Type AOT Compilation

The following functions will need to be embedded into the compiled string output.

Here is a quick draft setting up these functions (it's not very elegant)

import { Type, Kind, TypeRegistry, FormatRegistry, TSchema } from '@sinclair/typebox'
import { TypeCompiler } from '@sinclair/typebox/compiler'

// need to register these types for the compiler to pass preflight checks
TypeRegistry.Set('DataView', (schema, value) => value instanceof DataView)
FormatRegistry.Set('Email', (value) => value === 'user@domain.com')

// User defined AOT compile function
function CompileAOT<T extends TSchema>(name: string, schema: T) {
  // These functions need to be embedded into the generated out. 
  const intrinsicFunctions = (`
  function format(format: string, value: string) {
    if(format === 'Email') return value === 'user@domain.com'
    return false
  }
  function kind(kind: string, ordinal: number, value: unknown) {
    if(kind === 'DataView') return value instanceof DataView
    return false
  }
  function hash(value: unknown) { 
    return value 
  }
  `)
  // Compile with TypeScript annotations
  const checkFunction = TypeCompiler.Code(T, { language: 'typescript' })
  // Wrap everything in IIFE
  return `const ${name} = (() => { ${[intrinsicFunctions, checkFunction].join('\n')}})()`
}

// ---

const T = Type.Object({
  dataview: Type.Unsafe({ [Kind]: 'DataView' }),
  email: Type.String({ format: 'Email' }),
  unique: Type.Array(Type.Number(), { uniqueItems: true })
})

const C = CompileAOT('check', T)

console.log(C)

// const check = (() => { 
//   function format(format: string, value: string) {
//     if(format === 'Email') return value === 'user@domain.com'
//     return false
//   }
//   function kind(kind: string, ordinal: number, value: unknown) {
//     if(kind === 'DataView') return value instanceof DataView
//     return false
//   }
//   function hash(value: unknown) {
//     return value
//   }

// return function check(value: any): boolean {
//   return (
//     (typeof value === 'object' && value !== null && !Array.isArray(value)) &&
//     kind('DataView', 0, value.dataview) &&
//     (typeof value.email === 'string') &&
//     format('Email', value.email) &&
//     Array.isArray(value.unique) &&
//     value.unique.every((value: any) => (Number.isFinite(value))) &&
//     ((value: any) => { const set = new Set(); for(const element of value) { 
//       const hashed = hash(element); if(set.has(hashed)) { return false } else { set.add(hashed) } 
//     } return true } )(value.unique)
//   )
// }})()

Unfortunately, the above isn't very elegant as it does require the implementer to perform some fairly arcane string manipulation, but it is the best implementation under the current design.


Am I misunderstanding something about how AOT compilation is intended to work? Would you consider extending the TypeCompiler (or TypeRegistry) API with something like the following (probably to be used in the TypeCompiler.Visit() method)?

The above is generally how AOT compilation is intended to work where TypeBox pushes the complexity of generating AOT compiled output to the implementer (where AOT is usually integrated into build tools). The design isn't ideal tbh and there is room to improve things, but there is a bit of indecision on the best way to improve things... I think if implementing the SetGenerator, GetGenerator idea, I think this would just be a case of updating the TypeRegistry to support returning string check functions, something like the following.

TypeRegistry('DataView', 
  (schema, value) => value instanceof DataView,
  (schema) => `value instanceof DataView` // added
)

However I'm a bit reluctant to take on any updates here at the moment as I'm currently on the look out for better approaches to creating custom types in TypeBox...ideally approaches that wouldn't require the TypeRegistry at all (as I'd like to phase out this registry over the next few revisions)

The main thing holding back a more elegant AOT is mostly limitations in Json Schema where there isn't a way to appropriately validate against general JavaScript structures. TypeBox already provides a couple of extension types (Uint8Array and Date), but where these could also be phased out in favor of the following...

const DataView = Type.Unsafe<DataView>(Type.InstanceOf('DataView'))

// const DataView = { instanceof: 'DataView' }

// would generate -> value instanceof globalThis['DataView']

As such, I'm currently considering putting together a "JavaScript Schema" specification that extends "Json Schema" with additional keywords that enable it to validate against arbitrary JavaScript structures. This with the specific goal of encoding enough information in the schematics such that both Value.* checks and AOT generation can be derived from the schematics only (and not require registration)

This is a bit of long and winding response (sorry!) but hope it provides some insights into TypeBox AOT and where I'd like to take things in the library moving forward.

Happy to continue this discussion. Cheers S

sinclairzx81 commented 2 months ago

@jcmoore Heya,

Might convert this issue into a discussion as the example above is the current design (warts and all). As mentioned, I think there is room is improve AOT, but I think the path towards that involves devising extended schematics (keywords) that can appropriately validate JavaScript objects (this rather than adding more functions to the API surface).

Again, happy to continue a discussion on this. Cheers S