lukeed / tschema

A tiny (490b) utility to build JSON schema types.
MIT License
685 stars 1 forks source link

Recursive types #9

Open dearlordylord opened 1 month ago

dearlordylord commented 1 month ago

Rerucsive types are useful and also supported by JSON schema https://json-schema.org/understanding-json-schema/structuring#recursion.

In other parser library APIs, this feature is often called "lazy".

Typebox calls it .Recursive.

an example of a recursive structure:

// Typebox:
const FileSystem = Type.Recursive((Self) =>
  Type.Intersect([
    Type.Object({
      name: Type.String({
        minLength: 1,
      }),
    }),
    Type.Union([
      Type.Object({
        type: Type.Literal('file'),
      }),
      Type.Object({
        type: Type.Literal('directory'),
        children: Type.Array(Self),
      }),
    ]),
  ])
);
lukeed commented 1 month ago

Definitely. Also want to add t.ref() and a de-referencer (t.deref() or t.expand().. tbd)

The 3 of them can be added at any point in a minor release.


Additionally, I may also look into $defs/definitions but that might make TS inference excessively complicated. One idea is that t.ref() can only pass objects and then $defs is auto-populated w/ the full definition. Another is that t.ref() may only reference keys defined within the options.$defs that are passed. For example:

let User = t.object({ ... });

// Option 1
let Post = t.object({
  title: t.string(),
  author: t.ref(User)
  // ...
});

// Option 2
let Post = t.object({
  title: t.string(),
  author: t.ref('user'),
  // ...
}, {
  $defs: {
    user: User,
  }
})

The first would auto-generate $defs object, making for the same output. But it'd require that User actually be pre-defined. And you may not always want $defs to exist in the output, and so maybe you opt out of that by defining $defs: false in the options?

let User = t.object({ ... });

let Post = t.object({
  title: t.string(),
  author: t.ref(User)
  // ...
}, {
  $defs: false,
});
//-> generates object type w/ "author.$ref" pointing to a disconnected/undefined definition... but you chose that
lukeed commented 1 month ago

I think I'm going to go with Option 1 (auto-build $defs) and instead of t.deref(), that utility is going to be called t.copy()

I'm still undecided on what the t.self()/t.recursive() utility is going to called and what its API will look like, but jotting down some options for that now:

// spec example:
{
  "type": "object",
  "properties": {
    "name": { "type": "string" },
    "children": {
      "type": "array",
      "items": { "$ref": "#" }
    }
  }
}

// Option 1: emulate typebox 
let person = t.recursive(self => {
  return t.object({
    name: t.string(),
    children: t.array(self),
  });
});

// Option 2: named helper
let person = t.object({
  name: t.string(),
  children: t.array(
    t.self(), // <<
  ),
});

// Option 3: If t.ref() accepts strings...
let person = t.object({
  name: t.string(),
  children: t.array(
    t.ref('#'), // <<
  ),
});

// Option 4: Add generic unsafe/raw method
// -> general-purpose escape hatch, you = responsible 
let person = t.object({
  name: t.string(),
  children: t.array(
    t.raw({ $ref: '#' }) // <<
  ),
});

I really dont like Option 1, but I wanted to include it for completeness. It'd be the only method that requires a callback and it'd be more annoying to pick up thru t.Infer correctly.

@dearlordylord Thoughts?

dearlordylord commented 1 month ago

To the first comment:

t.ref(User) seems a bit more reasonable to me: your library is about a DSL over JSON Schema generation. If a user has to think in terms of $refs (probably too hot take, not entirely convinced about it:), then why not just let them define JSON schema by themselves and validate it as AJV does? So, option (1) from the first comment looks reasonable to me. Working with objects/types is farther away from "defining schema manually" and seems to be the point!

To the second comment:

Options 3 and 4 don't seem to be what you need (again, assuming it's a DSL abstracting JSON schema away). Option 1 seems to be what a couple of libraries do, which makes API more understandable to users (and also, there may be reasons why they do it this specific way!). Ref: rescript-schema (.recursive), Typebox (.Recursive).

Some facts that may or may not be relevant, if you want to check them out: Zod and valita have .lazy() API that doesn't pass "self" but also requires you to duplicate runtime and type definitions - that isn't perfect although in my opinion I don't care about duping types, as long as "it compiles = it works". But this approach, if used, may undermine your "tschema infers types" argument. There's also Arktype that does it in a very special way with scope, but the library premise itself is super special (they parse string literals with typescript metaprogramming)

UPD:

adding arktype code for same structure just to show how cool it is

const fileSystem = scope({
  filename: '0<string<255',
  file: {
    type: "'file'",
    name: 'filename',
  },
  directory: {
    type: "'directory'",
    name: 'filename',
    children: [
      'root[]',
      ':',
      (v, ctx) => {
        if (new Set(v.map((f) => f.name)).size !== v.length) {
          return ctx.mustBe('names must be unique in a directory');
        }
        return true;
      },
    ],
  },
  root: 'file|directory',
}).resolve('root');