How about producing and emitting schemas?

romeerez commented 2 months ago

I'm glad to find the standard-schema initiative, thanks for starting it!

I'd like to ask about two different cases: producing at runtime, and emitting schemas.

Producing schemas at runtime

In my case, it's an ORM where user defines column types in some specific way, and it can produce zod and valibot schemas at runtime:

// imaginary syntax:
const dbTable = defineTable('myTable', (t) => ({
  id: t.serial().primaryKey(),
  name: t.text().unique(),
  bio: t.text().max(1000),
}))

dbTable.outputSchema() // zod or valibot schema, based on user config
dbTable.inputSchema()
dbTable.createSchema() // id is omitted
dbTable.updateSchema() // partial

That's my use-case, which is applicable for any ORM, query-builder, and similar libraries that allow to define types with their custom DSL. And the DSL has to be custom because of their specifics.

I propose to define an AST that potentially covers all features of all the validation libraries, starting with a limited set, of course, the ast will be translated to a target schema by a third-party library.

So standard-schema defines an interface, zod defines specific schemas, and imaginary package @standard-schema/zod-producer connects to a standard-schema interface and knows how to produce a zod schema from the AST.

Different libraries have different feature set, so there should be a way to require a set of needed features. If, for example, client library needs discriminated unions, but user specifies a schema producer that doesn't support that, it fails at runtime. Could fail in compile time as well.

Schemas code-generation

Producing schemas at runtime has a caveat that the produced validation schema depends on the library (ORM) itself. In my case, it's backend-only, and it's not an option for user to import such a schema to the frontend in monorepo. This feature was requested a couple of times.

To support re-using validation on frontend, the only way is to have a CLI command to generate schemas code (let me know if there are other ways).

Code-generating schemas would be beneficial to libraries that have types defined outside of JS: GraphQL schemas, protobufs, introspecting database schemas.

The same AST from the above could be used by a third-pary library to generate code.

fabian-hiller commented 1 month ago

Thanks for sharing these ideas! I was already thinking about an AST-like representation of schemas, similar to JSON Schema but optimized for TypeScript. Probably not something we want to tackle right now, but definitely something very interesting for the future. This AST representation can then be used to directly validate unknown data or to generate its representation in a specific schema library like Zod, Valibot, or ArkType.

romeerez commented 1 month ago

I can see how it sounds too complex and not too realistic, so it's understandable why we don't currently have such a common AST.

As for runtime schema generation, it indeed has to map the AST to a validation lib on a TypeScript level, and I'd bet it's too impossible and risky to land up with "type instantiation is excessively deep" kind of problems.

But, the generation doesn't require TS mappings, so it's more realistic, and would be very useful.

For another use-case that I encountered recently, I was using @anatine/zod-mock to generate fake data for tests, and also wanted to support valibot. Luckily, someone already covered that: valimock, and this is probably a good tool and is enough, but you can see that some cases aren't implemented, and the fact that people have spend time to implement same kind of work for different libraries sounds like something a good engineer would want to automate.

You could write a mapping from a library schema to the common AST once (typebox, vine.js, effect schemas, etc), and write a mock data library for that AST just once. Or a library to generate protobufs, graphql types, it would be easier to generate anything, because right now you'd have to dig into implementation details of a given library and support only a single one at a time.

So the common AST doesn't have to deal with TypeScript type mappings, but to be an extended version of JSON schema. Because JSON schema is only for JSON types, and validation libs can also handle Date objects, functions, other objects, custom validations (refine in zod) and various things.

ssalbdivad commented 1 month ago

This "extended version of JSON schema optimized for TS types" is essentially what @ark/schema attempts to solve. It's the internal representation used for ArkType's type system.

As is, the requirements are quite stringent around how sets of types can be organized to ensure they are always fully reduced in ArkType, but for other validators where that's not a goal, we may eventually be able to adopt a looser version of that format.

It also offers some big advantages of JSON schema in terms of granularity e.g. being able to assign metadata to an individual constraint like a regex as opposed to only on the root of a type itself, giving you a lot more control over error messages.

As @fabian-hiller points out though, this would be a much bigger undertaking than the original scope of the project to do well.

logaretm commented 1 month ago

Forms would benefit greatly from this. Like we have discussed in #11, this can offload some decisions to the implementing form library like default value generation.

Something also that I have implemented with varying degree of success in my work is "path descriptors" which are basically some simple hints about a field/path.

The main use-case where we do actually walk the tree of different providers like Zod/Yup/Valibot is to determine if a field is required through detecting the presence of some validators, yes it is not perfect and doesn't cover custom validators but many people liked it because it reduces the verbosity of UI considerably using the schema as the source of truth.

Another use-case is form generation given a schema, but maybe I will create a specific discussion for this once I gather my thoughts on it.

I initially wanted to build more on this concept by extracting some other metadata like maxLength and other validators that can be represented as UI hints (e.g: x characters remaining) or upgrading validators to active constraints, in other words actively prevent more characters from being entered. I think an AST can allow something like that because it is possible with every schema provider I have tried so far.

If this is out of scope for now then it is fine, but it would go along way to furthering custom logic and decision making for extended libraries like forms. Ideally I want to only support this standard rather than maintain different implementations with inconsistent feature parity.

fabian-hiller commented 1 month ago

Once the standard spec is in v1, we can start thinking about what would be the right approach for a standard TS schema AST. Basically, it could live parallel to the spec in a @standard-schema/ast package instead of being included in the spec.

datner commented 4 weeks ago

I don't think it's the right approach to leave this as an afterthought.. The way the spec is currently structured is by declaring some shared interface and calling it a day. That is useful, but nothing trivial adapters don't currently solve.

By defining an AST a lot of the hard questions that are not immediately evident would raise their heads. It does not mean to commit to that AST, but working from the technical requirements into an interface (like a spec) rather than working from an interface into the technical requirements (like a library) would make sure that instead of having v1, which is a fundamentally arbitrary shared interface and a v2 that would instantly deprecate it. v1 would take longer to cook but would be significantly more future-looking

fabian-hiller commented 4 weeks ago

From my experience as the author of Modular Forms and Valibot, who regularly talks to other framework and library authors, I think the Standard Schema spec is exactly what most frameworks and libraries are looking for right now to simplify compatibility with schema libraries.

I think a standard AST is mainly interesting for form libraries and AI SDKs. Also, it is much harder, and in some cases probably impossible, to define such an AST that works for everyone, and to convince most schema libraries to rewrite their code. For this reason, the AST requires much more effort and time, which I don't have. Therefore, it is a lower priority for me at the moment.

However, feel free to talk to other schema library authors and write an initial proposal for a standard AST. Perhaps you could also lead this effort.

standard-schema / standard-schema

How about producing and emitting schemas? #7

Producing schemas at runtime

Schemas code-generation