Can Date be cast from string/number?

sinclairzx81 / typebox

Json Schema Type Builder with Static Type Resolution for TypeScript

Other

4.8k stars 153 forks source link

Can Date be cast from string/number? #343

Closed joezappie closed 1 year ago

joezappie commented 1 year ago

I'd like my frontend to be able to send dates as iso string or milliseconds and then be converted to a Date object on the server. My server may also generate some data that needs to get validated and when it comes from there, its already a date.

I tried adding a keyword to do the casting from string/number, but since Type.Date requires a type object it fails. I've always found it annoying that AJV does the type check before running keywords. I've been trying to add my own custom Date type to typebox but haven't had much luck there. Would be nice if casting for this was supported out of the box (may already be but I haven't found documentation on how to do it). Something like Type.Date({ cast: true }).

I do want to avoid doing a Type.Union(['Type.String({ castDate: true}), Type.Number({ castDate: true}), Type.Date()]) as I use dates in pretty much every model I have.

joezappie commented 1 year ago

I did just read through your comment in fastifies repo stating Type.Date is deprecated and we should make our own custom types if we want a date:

https://github.com/fastify/fastify/discussions/3357#discussioncomment-4241449

I'm trying to do that now, but I'm having issues:

const Date2 = TypeSystem.CreateType('Date2', (options, value) => {
  if (options.cast) {
    if (isNaN(parseFloat(value)) === false) {
      value = new Date(parseFloat(value));
    } else if (typeof data === 'string') {
      value = new Date(value);
    }
  }
  return value instanceof Date && isNaN(value) === false;
});

console.log(Date2);

const T = TypeCompiler.Compile(Date2);

const R = Value.Check(T, new Date());
console.log(R);

When running this, I'm getting an error TypeGuardUnknownTypeError: TypeGuard: Unknown type. My project is not written in typescript, is that causing the issue? Also can Types modify the value?

joezappie commented 1 year ago

Been doing a lot of reading and found your comments on #306 about wanting to leave serialization/coercion as out of scope for this project and totally understand your reasoning for it. Sounds like my best option is to just stick with doing a Union with an ajv keyword since that can modify data. Making it a reusable function so I don't have to type that out everytime I think is my best solution:

const DateType = (options) => {
  const dateObject = Type.Object({}, { additionalProperties: false, coerceDate: true });
  if (options?.coerce) {
    return Type.Union([Type.String({ coerceDate: true }), Type.Number({ coerceDate: true }), dateObject]);
  } else {
    return dateObject;
  }
};

const keywordDate = {
  keyword: 'coerceDate',
  type: ['string', 'number', 'object'],
  modifying: true,
  validate(keyword, data, metadata, context) {
    if (isNaN(parseFloat(data)) === false) {
      context.parentData[context.parentDataProperty] = new Date(parseFloat(data));
    } else if (typeof data === 'string') {
      context.parentData[context.parentDataProperty] = new Date(data);
    }

    const date = context.parentData[context.parentDataProperty];
    return date instanceof Date && isNaN(date) === false;
  },
};

Curious if you still have any other suggestion on this, and if not please close the issue.

sinclairzx81 commented 1 year ago

@jrj2211 Hi, Sorry for the delay, there's a few comments, here....

Been doing a lot of reading and found your comments on https://github.com/sinclairzx81/typebox/issues/306 about wanting to leave serialization/coercion as out of scope for this project and totally understand your reasoning for it.

TypeBox only supports a very limited form of value coercion by way of the Value.Cast function. The conversion logic for cast is very strict and internal to the Cast module (so can't be extended). It doesn't support Date to number | string conversions (but should support number | string to Date). The current value coercion logic has been trialed for around 6 months in TypeBox, but there's been a ton of cases highlighted (including yours) where I do think TypeBox could be providing better programmable support for value conversion....so, this aspect is now considered to be in scope, but may take some time before a formal release.

0.26.0 Changes

To better address value conversion / encoding and coercion. In 0.26.0, Value.Cast no longer performs value conversion internally, and there is a new Value.Convert module being added (which is where the Cast conversion logic will be moved to). There is some thought currently going into the future design of Value.Convert, but in the short term, 0.26.0 will support the following immediate API.

import { Value } from '@sinclair/typebox/value'

// current 0.26.0 implementation

const T = Type.Number()

const N = Value.Convert(T, '42') // try convert '42' to number

const R = Value.Check(T, N) // we need to check the conversion was successful

This won't immediately support Date to string | number conversions, however by pulling the Convert logic out from Cast, it frees up the implementation for newer better API's to handle user defined coercion rules.

Future Conversion API

The following are the current thoughts going into a future conversion API (very provisional)

import { Value, Conversion } from '@sinclair/typebox/value'

// -------------------------------------------------------------------
//  New: Conversion Class
// -------------------------------------------------------------------
const conversion = new Conversion()

conversion.Set('Date', (value) => (value instanceof Date) ? value.getTime() : value)

const N = Value.Convert(Type.Date(), new Date(), conversion)  // converts Date to number

const R1 = Value.Check(Type.Date(), N)                        // fail: N is not a Date

const R2 = Value.Check(Type.Number(), N)                      // ok:   N is a number

Notes on CreateType

Just on your CreateType implementation, just be aware that check callback should treat the value as immutable. By design, TypeBox treats check and convert as distinct operations.

const Date2 = TypeSystem.CreateType('Date2', (options, value) => {
  if (options.cast) {
    if (isNaN(parseFloat(value)) === false) {
      value = new Date(parseFloat(value)); // invalid: cannot re-assign immutable value
    } else if (typeof data === 'string') {
      value = new Date(value); // invalid: cannot re-assign immutable value
    }
  }
  return value instanceof Date && isNaN(value) === false;
});

I will include some documentation on this in later revisions around this (there are some changes on 0.26.0 around custom types also, so just be mindful of this). Pending release notes for 0.26.0 can be found https://github.com/sinclairzx81/typebox/blob/next/changelog/0.26.0.md

Summary

Unfortunately, TypeBox doesn't have any facilities to support the value conversions in the way you describe, so my recommendation would be to use Ajv until there some of the new value conversion logic lands on the 0.26.x revisions.

Hope this helps S

joezappie commented 1 year ago

Hi @sinclairzx81. Thanks for the detailed response! That all sounds great and glad to hear this is becoming in scope of the project. Since I've got my method for using AJV keywords I'm all set for the moment. I'm looking forward to it and awesome work on the library. Makes dealing with AJV so much more bearable.

Quick question from your example for the future API: How do you anticipate the convert functionality to be used in an schema where you're not immediately trying to convert a value? For example, given this schema, I'd want to add a conversion to my date field, but the values not known until some data is passed into ajv for validation:

const conversion = new Conversion()
conversion.Set('Date', (value) => (value instanceof Date) ? value.getTime() : value)

const C = ajv.compile(Type.Object(
  {
    _id: Type.ObjectId(),
    date: Value.Convert(Type.Date(), conversion), <-------
    entries: Type.Array(Type.Number),
  },
));

const R = C({ date: new Date(), entries: [1,2,3] })

Or will you create a new Type and give it the conversion object there?

sinclairzx81 commented 1 year ago

@jrj2211 Hi,

The initial thinking around this is that the Conversion module will only work for the new Value.Convert(T, value, conversion) function. So any enhancements here won't be applicable to Ajv (so you will need to continue to use custom configs or the coerceTypes config if using Ajv)

Quick question from your example for the future API: How do you anticipate the convert functionality to be used in an schema where you're not immediately trying to convert a value?

This will largely be an implementation detail, but the conversion logic for a target type should try to convert the value if it can OR just return the value if it cannot (indicating no conversion is possible).

Codecs

There's actually a broader scope to value conversion, and it actually has a lot of crossover with implementing a codec system for TypeBox (which is also being considered). The following something I think could be achieved with a more fleshed out Convert API, where value coercion is merely a codec one writes to remap values to their target types.

The following is another concept implementation which serializes Date to { timestamp: 123 }

interface EncodedDate { timestamp: number } // how we serialize dates to JSON

// ---------------------------------------------------------
// Encoder
// ---------------------------------------------------------
const encoder = new Conversion()          // maybe new Codec()?
encoder.Set('Date', (value: unknown) => { // target type: Date
   if(value instanceof Date) return { timestamp: value }
   return value // just return
})

// ---------------------------------------------------------
// Decoder
// ---------------------------------------------------------
const decoder = new Conversion()          // maybe new Codec()?
decoder.Set('Date', (value: unknown) => { // target type: Date
   if(typeof value === 'object' && value !== null && value.timestamp === 'number') {
     return new Date(value.timestamp)
   }
   return value // just return
})

// ---------------------------------------------------------
// Example
// ---------------------------------------------------------
const encoded = Value.Convert(Type.Date(), new Date(), encoder)        // const encoded = { timestamp: 1 }
const decoded = Value.Convert(Type.Date(), { timestamp: 1 }, decoder)  // const decoded = new Date(1)

For value checking, I may consider additional value guards to make writing codecs a bit simpler.

const decoder = new Conversion()          // maybe new Codec()?
decoder.Set('Date', (value: unknown) => { // target type: Date
   if(Value.IsObject(value) && Value.IsNumber(value.timestamp)) {
     return new Date(value.timestamp)
   }
   return value // just return
})

These are just some initial thoughts on how such an API might look. I'll put some thought into this over the course of 0.26.0 with possible release in 0.27.0 or sooner if it's possible to implement without breaking changes.

Hope this gives a bit more insight into the thinking here :) Cheers! S

chrisui commented 1 year ago

Very interested in a general codec solution as it's still a bit of an unsolved problem how best to efficiently implement serialisation/parsing/encoding/decoding with typebox in an efficient manner.

A few thoughts from above examples. Wonder if you've progressed thinking here any further?

Being more concrete with new Codec seems preferable to me
Given your other api patterns I'm surprised you wouldn't just have a encoder.Convert() method on the Codec/Conversion class
Do we need to lazily bind encoders or can we just pass to the constructor? Eg. new Codec({ Date: () => {} }
Then maybe you could just have an indexed interface and allow for people to extend with method names matching their Kind names - much like TypeBuilder
It would be good to allow an encoder/decoder to return/signal an error
Then it would be good to allow the Convert method to either fail-early or encode/decode lossily with a way to get the paths that failed

sinclairzx81 commented 1 year ago

@chrisui Heya

Yeah, I still don't know what I want to do here (or if TypeBox should be supporting codecs at all). On one side, there's use cases where users need value coercion (and where this functionality naturally sits with the Value.* API), on other the other side, there's potential to vastly accelerate message encode/decode (JSON, CBOR acceleration is of considerably interest to me) of which a codec system would be the preference.

Currently, given value coercion could technically be implemented through a codec system; this has me idling on doing something here in the short term. Mostly, the apprehension to move on this comes down to adding a codec system which would partially couple message encode/decode to TypeBox; this when more simple encode/decode + validate can be handled so trivially without the sophistication of having users adopt whatever mechanisms the codec system would mandate. The simplest solution is usually the best, so really just need strong justification for adding such a system.

I am planning on adding something here (specifically for customized value coercion rules), but letting this one sit for the time being based on the above. Happy to discuss more tho, I think a API design discussion is probably the best path forward at this time (so open to seeing code examples, external implementations, that sort of thing)

loynoir commented 1 year ago

Hi, guys.

I think can adapt zod schema-transform-parse design.

const schema = z
  .union({
    ax: z.string().regex(aregx).transform_regex_groups().brand<{ ay: AOpaque }>(),
    bx: z.string().regex(bregx).transform_regex_groups().brand<{ by: BOpaque }>()
  })
  .transform_union_scenario() satisfies JsonValue

const valid = parse<boolean>(schema, val, { transformer: DEFAULT_VALIDATOR_WITH_BOOT_RETTYPE  })

type expected = { scenario: 'ax'; value: { ay: AOpaque } } | { scenario: 'bx'; value: { by: BOpaque } }

const parsed: expected = parse<expected>(schema, val)

sinclairzx81 commented 1 year ago

@loynoir

I have added an experimental implementation of type/value transformation on the transform branch for review and feedback. I've included an example on example/index.ts which demonstrates the usage. You can run this by cloning the project and running npm start while on this branch. High level changes to enable the feature can be found on this commit.

Have updated the transform type to accept both encode and decode functions (forming a codec). Both are optional. Have split Value.Transform into the functions Value.Encode and Value.Decode which calls the respective codec functions on the transform type. The type will infer as the decode function return type (if specified) or Static<T> if decode is omitted (suggesting no transformation)

The following example shows json string encode and decode implemented through transform types.

const JsonString = Type.Transform(Type.String(), {
  decode: (value) => JSON.parse(value),
  encode: (value) => JSON.stringify(value),
})

const decoded = Value.Decode(JsonString, '[1, 2, 3, 4]')
const encoded = Value.Encode(JsonString, decoded)

console.log(decoded)  // [1, 2, 3, 4]
console.log(encoded)  // "[1, 2, 3, 4]"

tinchoz49 commented 1 year ago

Hi @sinclairzx81 ! thank you for working on this feature. We have a project that is heavy using this library and think the Transform feature could help us a lot. What do you think about the idea of publishing a npm alpha version of this? We can test it in our project where we have a lot of schemas

ehaynes99 commented 1 year ago

this when more simple encode/decode + validate can be handled so trivially

Sorry, but what do you mean here? I don't see any general purpose way of handling this well. Let's say all I want to do is the inverse of JSON.stringify(new Date()) (which, for the record, I suspect is at least half of why people would want conversions). I could:

bring in something like AJV. That eliminates use of TypeCompiler altogether which, in turn, eliminates use of tools like @fastify/type-provider-typebox and other framework-specific adaptations. This effectively limits TypeBox to schema definition only. This is what I'm currently (reluncatntly) doing.

My AJV version, if you're interested

```typescript import { Type } from '@sinclair/typebox' import { FuncKeywordDefinition } from 'ajv' import { fullFormats } from 'ajv-formats/dist/formats' const dateValidator: (data: string) => boolean = (fullFormats['date-time'] as any).validate /** * TypeBox's 'Unsafe' allows using a different type than the one that is represented * in the schema. In this case, we'll convert an ISO string into a Date object. */ export const IsoDate = Type.Unsafe({ type: ['string', 'object'], format: 'date-time', isoDate: true }) /** * This is a custom keyword that will allow us to use the `date-time` format in our schemas * while using a JS `Date` in our code. * * Note: DO NOT add `date-time` format from `ajv-formats` to the instance of Ajv, as it will * conflict with validation here. Its behavior is encapsulated in this keyword. */ export const isoDateKeyword: FuncKeywordDefinition = { keyword: 'isoDate', modifying: true, error: { message: 'must be ISO date', }, validate: (keyword, data, metadata, context) => { if (data instanceof Date && !isNaN(data.getTime())) { return true } if (typeof data !== 'string' || !dateValidator(data)) { return false } const date = new Date(data) context!.parentData[context!.parentDataProperty] = date return true }, } export const bodyAjvInstance = new Ajv({ allErrors: true, allowUnionTypes: true, // Must add the format here but NOT add `date-time` format from `ajv-formats` formats: { 'date-time': true }, coerceTypes: false, }).addKeyword(isoDateKeyword) ```

handle the conversions after validation. The sheer mass of redundancy is a pain, but more importantly, this ruins the "holy grail" feature of TypeBox: finally having a single type definition. I have to duplicate basically everything...

const UserSchema = Type.Object({
  // ...
  createdAt: Type.String(),
  updatedAt: Type.String(),
})

export type User = Omit<Static<typeof UserSchema>, 'createdAt' | 'updatedAt'> & {
  createdAt: Date
  updatedAt: Date
}

export const parseUser = (user: Static<typeof UserSchema>): User => {
  const { createdAt, updatedAt, ...rest } = user
  return {
    ...rest,
    createdAt: parseDate(createdAt),
    updatedAt: parseDate(updatedAt),
  }
}

Make broad assumptions (that also drastically reduce performance):


const DATE_PATTERN = /^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{3}Z$/

export const jsonContentTypeParser: FastifyContentTypeParser = (request, payload, done) => { rawBody( payload, { length: request.headers['content-length'], limit: '1mb', encoding: 'utf8', }, (err, body) => { if (err) return done(err) try { const parsed = secureParse(body, (_, value) => { const isDate = typeof value === 'string' && DATE_PATTERN.test(value) return isDate ? new Date(value) : value }) done(null, parsed) } catch (error: any) { done(error) } }, ) }

sinclairzx81 commented 1 year ago

@ehaynes99 Hey

Sorry, but what do you mean here? I don't see any general purpose way of handling this well. Let's say all I want to do is the inverse of JSON.stringify(new Date()) (which, for the record, I suspect is at least half of why people would want conversions)....

Mostly I mean that when you write your schematics, they are representative of the wire encoding (or whatever is encodable in JSON), with the decoding generally happening in your route handlers (assuming you're writing web servers with TB). This is trivial to do manually, but certainly annoying for general use cases (like decoding numbers as Dates (and vice versa)), the apprehension mostly stems from adding yet more infrastructure to TB to solve something that can be handled fairly easily (but I still want to provide "something" that doesn't tax the library too hard)

bring in something like AJV. That eliminates use of TypeCompiler altogether which, in turn, eliminates use of tools like @fastify/type-provider-typebox and other framework-specific adaptations. This effectively limits TypeBox to schema definition only. This is what I'm currently (reluncatntly) doing.

TypeBox is principally written to be a schema definition builder and static type resolver library first and foremost (so not a specifically a validator). The TypeCompiler and Value modules are provided as optional extras if you need them (they're what I use), but certainly not intended to be a substitute for Ajv (or any other JSON Schema validator), especially if you're already on infrastructure that is utilizing Ajv (such as Fastify). In this regard, it's not redundant to use TB just for schema construction, it's what this library is designed for (and where most of the effort goes)

Also FYI, you can use the Fastify Type Provider without configuring .setValidationCompiler(). I believe the documentation mentions it's an optional thing you can use. I added the compiler option to the provider because it can speed up application boot time when you have many schematics to compile (as the TB JIT compiler is very fast), but again, you don't have to use it.

handle the conversions after validation. The sheer mass of redundancy is a pain, but more importantly, this ruins the "holy grail" feature of TypeBox: finally having a single type definition. I have to duplicate basically everything...

The prototype on the transform branch is intended solve this problem by letting you describe the wire encoding (i.e. number) and the inference type (i.e. Date). However you will need to use Value.Encode and Value.Decode functions (implemented on the branch) to handle decoding the value (either in your route handler (manual), or integrated into framework validation pipelines (automatic))

Note the implementation is merely a prototype (and it may not make it into the library as I'm not very keen on augmenting schematics with codec functions, or having the indirect coupling between Type and Value), but it's the best design I've managed to come up with thus far, and one that can be integrated into validation pipelines (such as the validation pass on Fastify typebox-type-provider.

I am open to thoughts and feedback on the implementation, but letting it settle for a while (mostly because I'm currently focusing on other projects and don't have the bandwidth to fully implement at this time)

Hope that brings some insight Cheers!

sinclairzx81 commented 1 year ago

@tinchoz49 Hi!

I may be able to investigate a -dev npm package for some of the transform work in a few weeks time (tentative). The current implementation needs a lot of work (and tests written) and is very much a prototype just to explore the implementation. I do invite people to try out what's on the transform branch and give feedback tho.

Will notify on this thread once I'm able make the prototype a bit more formal and ready for a -dev publish. Cheers!

ehaynes99 commented 1 year ago

Completely understandable. You've been beyond responsive on this project. I've tried out the transform branch, and overall, I think it's more than suitable for what I would need.

FWIW, I think TypeCompiler is a nicer api than that of AJV. Check is more performant for the happy path -- which should be the overwhelming majority of cases -- and I don't love the mutable nature of their validators, though that's more of a philosophical issue than a real problem.

Most importantly, though, I've spent years battling this overarching problem of needing a generalized way to introduce external input into the TypeScript type system safely. While I am often writing "web servers", I've grown to really dislike the term. My applications always have myriad transport layers beyond HTTP endpoints like queues, structured data in caches, or RPC actions that hide away the details of the transport into a simple request/response paradigm. Libraries like fastify that marry validation to a particular type of communication are missing the larger problem.

Some small bits of feedback:

Value.Encode returns the wrong type currently (same type as the decoded value), but I assume that's by virtue of it being a prototype
I would be interested to see if it would replace Value.Convert, or if they would coexist. In general, I don't really want to accept those kinds of payloads, but there are spaces like parsing query/path parameters where it's useful. I suppose a simple NumberParam transform type would be sufficient, though. It's perfectly reasonable to say that that's outside of the scope of a JSON validator, but still would be nice to have. I respect difficulty of the divide between defining JSON schemas and validating JavaScript objects. It's a very fine line.

tzelon-cypator commented 1 year ago

I have a similar issue with Bigint. As JSON.parse cannot really handle begins we use string to represent them. So, to convert string bigint to a real bigint I clone the Convert code and add a new case with my BigintString type.

chrisui commented 1 year ago

Slightly lost on the utility of the example provided. Would it not be more useful to provide an encoder/decoder for a kind than a specific type? It's not immediately obvious how you would re-use and compose transformers for more complex types.

Dinero snapshots work as a great non-trivial example for serialisations imo if one is needed.

ehaynes99 commented 1 year ago

@chrisui Perhaps your confusion is calling it a "transformer". Maybe TransformableType would be more accurate. They're composed in the same way that you would any other type. The type checker checks the serialized type, and Value.Decode converts it to the realized type.

import { Static, Type } from '@sinclair/typebox'
import { TypeCompiler } from '@sinclair/typebox/compiler'
import { Value } from '@sinclair/typebox/value'
import { inspect } from 'node:util'

const UnixTime = Type.Transform(Type.Number(), {
  decode: (n) => new Date(n * 1000),
  encode: (d) => d.getTime() / 1000,
})

// type UnixTime = Date
type UnixTime = Static<typeof UnixTime>

const Customer = Type.Object({
  name: Type.String(),
  email: Type.String(),
  dob: UnixTime,
})

// type Customer = {
//   name: string;
//   email: string;
//   dob: Date;
// }
type Customer = Static<typeof Customer>

const customerChecker = TypeCompiler.Compile(Customer)

// customerChecker:
// return function check(value) {
//   return (
//     (typeof value === 'object' && value !== null && !Array.isArray(value)) &&
//     (typeof value.name === 'string') &&
//     (typeof value.email === 'string') &&
//     (typeof value.dob === 'number' && Number.isFinite(value.dob))
//  )
// }
console.log('***** customerChecker:', customerChecker.Code())

const customer: Customer = {
  name: 'Joe Schmoe',
  email: 'joe.schmoe@example.com',
  dob: new Date('1985-10-26'),
}

// currently incorrectly typed as `Customer`
const encoded = Value.Encode(Customer, customer)

// encoded: {
//   name: 'Joe Schmoe',
//   email: 'joe.schmoe@example.com',
//   dob: 499132800,
// }
console.log('encoded:', encoded)

// true
console.log(customerChecker.Check(encoded))

const json = JSON.stringify(encoded)

const parsed = JSON.parse(json)
// true
console.log(customerChecker.Check(parsed))

const decoded = Value.Decode(Customer, parsed)

// decoded: {
//   name: 'Joe Schmoe',
//   email: 'joe.schmoe@example.com',
//   dob: 1985-10-26T00:00:00.000Z
// }
console.log('decoded:', decoded)

// checker checks SERIALIZED type
const errors = [
  ...customerChecker.Errors({
    name: 'Joe Schmoe',
    email: 'joe.schmoe@example.com',
    dob: new Date('1985-10-26'),
  }),
]

// errors: [
//   {
//     type: 29,
//     schema: {
//       type: 'number',
//       [Symbol(TypeBox.Kind)]: 'Number',
//       [Symbol(TypeBox.Transform)]: { encode: [Function: encode], decode: [Function: decode] }
//     },
//     path: '/dob',
//     value: 1985-10-26T00:00:00.000Z,
//     message: 'Expected number'
//   }
// ]
console.log('errors:', inspect(errors, { depth: null }))

sinclairzx81 commented 1 year ago

Hi everyone!

Hey, I've just published Revision 0.30.0, and along with many other updates in this revision, I spent quite a bit of time reviewing this Type.Transform type, but have decided to leave it out of this revision ... for now.

I think the Type.Transform implementation was close to an ideal solution, but since have reimplemented it outside of TypeBox's type system and have decided to treat it as a dedicated codec system unto itself (which needs further development and some integration considerations ironed out)

Moving forward (and for the next few releases), I will be including the transform type as a example single file module users can copy and paste into their projects and use that way. The module contains the implementation and API that TypeBox would ultimately get should it be included in future, but is offered in this fashion to encourage users to experiment and provide feedback (as well as submit bug fixes). I'm most interested in getting end user feedback on framework integration (specifically handling auto inference for input and output through transforms), so any feedback on this aspect would be most welcome.

The implementation and documentation can be found at the following example URL

https://github.com/sinclairzx81/typebox/tree/master/examples/transform

Release notes for Revision 0.30.0 can be found at the following link also

https://github.com/sinclairzx81/typebox/blob/master/changelog/0.30.0.md

Am going to close off this issue for now, but may reopen again in future once transforms are a little more ready for showtime.

Thanks all S

ehaynes99 commented 1 year ago

The static derivation feels inverted to me... Shouldn't the result of Static be the decoded value? That's the type one would use throughout the entire codebase. E.g from the example:

const Timestamp = Transform(Type.Number(), {
  decode: (value) => new Date(value),
  encode: (value) => value.getTime(),
})

type N = Static<typeof N>
const N = Type.Object({
  timestamp: Timestamp,
})

The type N becomes:

type N = {
    timestamp: number;
}

There would effectively never be a place in the code where I would want such an object. Everywhere in the entire application, I would want a:

type N = {
    timestamp: Date;
}

This basically means that application code could never use Static again, but rather would need to do this for every single model:

type N = Static<TransformUnwrap<typeof Timestamp>>

The only exceptions I can come up with would be buried deep inside of a library/framework. Should it not, then, be the exceptional case to derive the serialized type? E.g.

// ficticious
type SerializedN = Static<TransformWrap<typeof Timestamp>>

sinclairzx81 commented 1 year ago

@ehaynes99 Heya!

The static derivation feels inverted to me... Shouldn't the result of Static be the decoded value? That's the type one would use throughout the entire codebase. E.g from the example:

Yeah, this has been difficult. I've since changed things up from the transform prototype while taking a much deeper look at this (noting the prototype did infer as the decoded value) and just moved all inference responsibility into examples/transform/transform.ts. In a nutshell, I didn't think it was correct to infer as the decoded value when the schematics may be representing some other value (also, mandating a transform type be decoded just to get the static inference aligned didn't feel quite right either)

The examples/transform takes a different approach where Static will always infer as the encoded type (no changes required to Static), but where the return values for Decode gives the correct value and type. In this respect, Decode returns the computed transform type and Encode just returns Static (a bit confusing, but will probably include a EncodeStatic<T> and DecodeStatic<T> to make more clear that transform inference needs to work outside TB's typical Static)

The reasoning behind this mostly comes from looking at the zod .parse() function, and where Decode() is essentially just a TB version of parse() (both of which yield a computed type + value as a result of decoding/parsing). However TB is sorta sitting somewhere between zod and io-ts where Encode() is thought of as the direct inverse of parse().

There would effectively never be a place in the code where I would want such an object. Everywhere in the entire application, I would want a:

Yeah, I've had a look at this also, and reasoned that end users "should" see the encoded value (i.e. the wire value), and that wire values should be explicitly decoded into application values. I do agree, this should be automatic for good DX, its just that frameworks today don't typically provide mechanisms to support bi-directional type codecs (and in the documentation for example/transform, I've included write ups and anticipated Fastify usage under their current type infrastructure (which would also be applicable to tRPC)). But of the explicit decode, I feel this is better as something is going to need to decode these values and observe the output types (be it users or framework plugins), and it's from the explicit decode that gives framework integrators some leverage to align types in their respective frameworks (rather than having all the auto type decoding locked up inside of TypeBox).

Much of this is fairly complicated to reason about (it's been challenging) and nothing is final yet (hence why I've been unable to include transforms in the 0.30.0 Revision), but I think the explicit decode requirement (as per example) is a step towards automatic decode in future, it just needs experimentation and feedback from users trying to integrate in their projects.

Hope this brings some more insight into where things are at with this feature Cheers! S

ehaynes99 commented 1 year ago

EncodeStatic<T> and DecodeStatic<T> would definitely be nice, or something for the lazy like Encoded<T> and Decoded<T>.

Since you bring up zod, I'm sure you're familiar, but all of the zod types carry around Input and Output types that just happen to be the same for types with no transformation. The top-level utilities exposed for these are:

export type TypeOf<T extends ZodType<any, any, any>> = T["_output"];
export type input<T extends ZodType<any, any, any>> = T["_input"];
export type output<T extends ZodType<any, any, any>> = T["_output"];
export type { TypeOf as infer };

I'm not saying that you would need to follow that pattern for the representation, but I think the utilities behave in an intuitive way. In the very beginning of the intro, they demonstrate z.infer as effectively "get the type of the thing you're going to use in the code". I have viewed TypeBox's Static as synonymous with that, declaring a type of the same name right next to every schema.

Beyond principle of least surprise, however, I can't actually migrate to this. Currently, I have a shared library using ajv doing the conversion and exposing a typebox type like:

export const IsoDate = Type.Unsafe<Date>({ type: ['string', 'object'], format: 'date-time', isoDate: true })

ajv handler

```typescript import { FuncKeywordDefinition } from 'ajv' import { fullFormats } from 'ajv-formats/dist/formats' const dateValidator: (data: string) => boolean = (fullFormats['date-time'] as any).validate /** * This is a custom keyword that will allow us to use the `date-time` format in our schemas * while using a JS `Date` in our code. * * Note: DO NOT add `date-time` format from `ajv-formats` to the instance of Ajv, as it will * conflict with validation here. Its behavior is encapsulated in this keyword. */ import { FuncKeywordDefinition } from 'ajv' import { fullFormats } from 'ajv-formats/dist/formats' const dateValidator: (data: string) => boolean = (fullFormats['date-time'] as any).validate /** * This is a custom keyword that will allow us to use the `date-time` format in our schemas * while using a JS `Date` in our code. * * Note: DO NOT add `date-time` format from `ajv-formats` to the instance of Ajv, as it will * conflict with validation here. Its behavior is encapsulated in this keyword. */ export const isoDateKeyword: FuncKeywordDefinition = { keyword: 'isoDate', modifying: true, error: { message: 'must be ISO date', }, validate: (keyword, data, metadata, context) => { if (data instanceof Date && !isNaN(data.getTime())) { return true } if (typeof data !== 'string' || !dateValidator(data)) { return false } const date = new Date(data) context!.parentData[context!.parentDataProperty] = date return true }, } ```

I was really hoping to get ajv out of the mix and replace that type with a transform such as:

export const IsoDate = Transform(Type.String(), {
  decode: (value) => new Date(value),
  encode: (value) => value.toISOString(),
})

I can't, however, because I have hundreds of types derived from models that would break.

Yeah, I've had a look at this also, and reasoned that end users "should" see the encoded value (i.e. the wire value)

FWIW, I'm not sure about that. Users don't read or manipulate encoded JSON strings, only the higher level types into which they're decoded. In fact, json is a rather unusual form of encoding in that humans can reason about the encoded type at all. Binary formats, text encoding, compression, etc. are largely black boxes that end users can take for granted.

sinclairzx81 commented 1 year ago

@ehaynes99 Hiya,

There is a implementation of Transform types published on 0.31.0-dev-1 which has been fully tested, and should be ready to publish within the next few days. can install and test with

$ npm install @sinclair/typebox@dev

Information on this release (it's a big one) can be found on the PR https://github.com/sinclairzx81/typebox/pull/525. Implementing this feature while keeping bundle sizes down (and code maintainable) has been a real effort (on top of the engineering required to make this work), so hope the feature is well received....in all TypeBox finally has a .parse() function...it's called .Decode(T) and it's implemented on both Value and TypeCompiler modules.

Would be interested in getting some feedback before release Cheers! S

ehaynes99 commented 1 year ago

Sorry, was trying to get something together tonight, but I'm out of steam for the day... I do have some thoughts, but I don't want to give low-quality feedback on this as it is a really big milestone. I'm not sure when you were planning on releasing. I'll try to work it out tomorrow, but if the weekend is ok, even better.

ehaynes99 commented 1 year ago

I'll keep hammering at some examples, but I wanted to go ahead and get back to you. Overall, I like it, but a couple of things:

The part I can't figure out is how to generically refer to a schema that will have a specific decoded type. Contrived example, but say I want a datastore to allow passing in a schema for an object with specific fields. I can do something like this:

const IsoDate = Type.Unsafe<Date>({ type: ['string', 'object'], format: 'date-time', isoDate: true })

const Customer = Type.Object({
  id: Type.Number(),
  name: Type.String(),
  createdAt: IsoDate,
  updatedAt: IsoDate,
})

type PersistedSchema = TObject<{
  id: TNumber
  createdAt: TSchema & { static: Date }
  updatedAt: TSchema & { static: Date }
}>

type Database = {
  createTable: <T extends PersistedSchema>(name: string, schema: T) => Promise<void>
  // ...
}

const db = {} as Database

db.createTable('customers', Customer)

But with the potential for transforms, I'm not sure how to do that. Basically, I want a "schema where the StaticDecode of field x is type Y".

const IsoDate = Type.Transform(Type.String())
  .Decode((x) => new Date(x))
  .Encode((x) => x.toISOString())

type PersistedSchema = TObject<{
  id: TNumber
  createdAt: TSchema & ???
  updatedAt: TSchema & ???
}>

Is that possible?

Second (and this may simply be a preference), was there a particular reason for switching to the builder pattern? Builders aren't that common in TS, and I can't really envision a case where you would want a reference to the interstitial types. The declarative way seemed more natural to me, particularly in the context of schema definition.

type TransformOpts<T extends TSchema, U extends Json> = {
  encode: TransformFunction<U, StaticDecode<T>>
  decode: TransformFunction<StaticDecode<T>, U>
}
function Transform<T extends TSchema, U extends Json>(
  encodedSchema: T,
  { encode, decode }: TransformOpts<T, U>,
): TTransform<T, U> {
  const schema = TypeClone.Type(encodedSchema)

  return (
    TypeGuard.TTransform(schema)
      ? (() => {
          const Encode = (value: unknown) => schema[Transform].Encode(encode(value as any))
          const Decode = (value: unknown) => decode(schema[Transform].Decode(value))
          const Codec = { Encode, Decode }
          return { ...schema, [Transform]: Codec }
        })()
      : (() => {
          const Codec = { Decode: decode, Encode: encode }
          return { ...schema, [Transform]: Codec }
        })()
  ) as TTransform<T, ReturnType<typeof decode>>
}

sinclairzx81 commented 1 year ago

@ehaynes99 Hi! Thank you for the feedback! (It's very appreciated)

Database Modelling

The part I can't figure out is how to generically refer to a schema that will have a specific decoded type. Contrived example, but say I want a datastore to allow passing in a schema for an object with specific fields. I can do something like this:

Yeah, I'd actually been giving some thought to database encoding while building out the feature (noting TB is starting to see some usage in database ORM modelling). I'm going to be providing some examples in the near future for how developers can approach layering databases through Transform types, but will show the scripts I've been prototyping with below.

So, the following is a very layered (and somewhat complex) example of how you would model each layer of a vendor specific database. It shows remapping vendor identifiers (such as Mongo's ObjectId), working with generic application types for (i.e. created, updated used as base class properties as per your question) and shows transform inference when applied to a user facing collection interface (essentially replacing Static with the new StaticDecode). The example is a little long winded (as it's handling all layers from the database up to user space), but should show each step involved in layering.

You can copy and paste the following into a test project with @sinclair/typebox@dev installed.

import { Type, Kind, TypeRegistry, type StaticDecode, type TObject } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

// -------------------------------------------------------------------
// Database: Vendor Id (Mongo)
// -------------------------------------------------------------------
TypeRegistry.Set('ObjectId', () => true)

const ObjectID = Type.Unsafe<ObjectId>({ [Kind]: 'ObjectId' })

export class ObjectId { 
  constructor(private readonly _id: string) { } 
  toHex() { return this._id } 
}
// -------------------------------------------------------------------
// Database: Types
// -------------------------------------------------------------------
const DatabaseId = Type.Transform(ObjectID)
  .Decode(value => value.toHex())
  .Encode(value => new ObjectId(value))

const DatabaseDate = Type.Transform(Type.Number())
  .Decode(value => new Date(value))
  .Encode(value => value.getTime())

const DatabaseType = Type.Object({
  _id: DatabaseId,
  created: DatabaseDate,
  updated: DatabaseDate,
})
// -------------------------------------------------------------------
// Application: Type Factory
// -------------------------------------------------------------------
const CreateType = <T extends TObject>(schema: T) => 
  Type.Composite([DatabaseType, schema])

// -------------------------------------------------------------------
// Application: Types
// -------------------------------------------------------------------
const Customer = CreateType(Type.Object({
  name: Type.String(),
  email: Type.String()
}))
// -------------------------------------------------------------------
// Database: Encode & Decode
// -------------------------------------------------------------------
const decoded = Value.Decode(Customer, {            // const decoded = {
  _id: new ObjectId('000000000000000000000000'),    //   id: '000000000000000000000000',
  created: 0,                                       //   created: 1970-01-01T00:00:00.000Z,
  updated: 0,                                       //   updated: 1970-01-01T00:00:00.000Z,
  name: 'user',                                     //   name: 'user',
  email: 'user@domain.com'                          //   email: 'user@domain.com'
})                                                  // }

// encoded - the encoded database record (write)
const encoded = Value.Encode(Customer, decoded)     // const encoded = {
                                                    //  _id: ObjectId { _id: '000000000000000000000000' },
                                                    //  created: 0,
                                                    //  updated: 0,
                                                    //  name: 'user',
                                                    //  email: 'user@domain.com'
                                                    // }

// -------------------------------------------------------------------
// MongoCollection<Customer>
// -------------------------------------------------------------------
namespace customers {
  export async function find(query: unknown): Promise<StaticDecode<typeof Customer>[]> { 
    /* todo */ return [] 
  }
  export async function insert(value: StaticDecode<typeof Customer>) { 
    /* todo */ 
  }
  export async function update(id: StaticDecode<typeof DatabaseId>, value: Partial<StaticDecode<typeof Customer>>) { 
    /* todo */ 
  }
  export async function remove(value: StaticDecode<typeof DatabaseId>) { 
    /* todo */ 
  }
}
customers.insert({
  _id:'000000000000000000000000',
  created: new Date(0),
  updated: new Date(0),
  name: 'user',
  email: 'user@domain.com'
})
customers.update('000000000000000000000000', {
  email: 'dave@domain.com'
})
customers.remove('000000000000000000000000')

const results = customers.find(
  `where email like '@domain.com'`
)

TransformBuilder

Second (and this may simply be a preference), was there a particular reason for switching to the builder pattern? Builders aren't that common in TS, and I can't really envision a case where you would want a reference to the interstitial types. The declarative way seemed more natural to me, particularly in the context of schema definition.

I actually mulled the fluent builder quite a bit (TB isn't known for using fluent patterns) but have made the decision to use it as a constraint for technical reasons. The big decision to use it comes down to inference issues with TS deriving return types for "yet to be defined" codec functions when used in a declarative context. It was possible to break inference by defining the Encode function before Decode in the previous declarative design.

// OK: This works fine because TS can infer the `string` from the Decode return type
const T1 = Type.Transform(Type.Number(), {
  Decode: value => value.toString(),  // number > string
  Encode: value => parseInt(value)   // string > number
})
// ERROR: This doesn't work because Decode is defined 'after' Encode.
const T2 = Type.Transform(Type.Number(), {
  Encode: value => parseInt(value),   // unknown > number (TS cannot resolve the type for Decode yet)
  Decode: value => value.toString()  // number > string
})

So, the builder is used to enforce that Decode is specified "before" Encode such that TS can reconcile the codec inference appropriately. The other reason to favor fluent, is because Transform types are bijective and must implement both encode and decode, so the builder pattern helps to enforce bijectivity by requiring users to implement both functions to produce a TTransform. For types that can only be decoded (not encoded), the recommendation here will be to encourage users to throw in the Encode function (making things very explicit at the user level of what can and cannot be encoded)

Hope that brings some high level insight into the thinking around these aspects (it's actually good to explain some of the reasoning here in a bit more detail prior to publishing out such a large feature out (it's a big one)). In terms of timings, I may look at the 0.31.0 publish either Saturday or Sunday when things are a bit quiet on the work front.

Thanks again for the feedback! Happy to discuss more if you have any other thoughts or questions Cheers! S

ehaynes99 commented 1 year ago

Gotcha. The db was just a made up example; I'm not using TB for that currently. I was more asking about how to refer to a schema by the decoded type, but thinking about it more, that probably doesn't make sense. Effectively I was looking for:

type SchemaFor<T> = TSchema & { decodeStatic: T }

However, you would really need to narrow the encoded type to be able to do anything with it. The type checker doesn't operate on the decoded values, and if you solely expressed "a schema where field x has decoded type Y", you would necessarily have an encoded type of unknown. So it's likely best that there's NOT such a thing.

Makes sense on the builder. I actually fiddled with the types for quite a while. It would be nice if there were a convenient way to explicitly state the decoded type up front, but I couldn't find any good way to avoid having to state the schema type of the encoded value (as opposed to the encoded value type). Even with the builder syntax, that's a little bit fiddly, but it's unavoidable, I think.

// can't do this
Type.Transform<string, Date>(Type.String())
// it's actually this
Type.Transform<TString, Date>(Type.String())

// can't do this
const IsoDate: TTransform<string, Date> = // ...
// it's actually this
const IsoDate: TTransform<TString, Date> = // ...

// 
const IsoDate = // ...
  // ...
) satisfies TTransform<any, Date>

It's not too bad to use TString, but it could get unwieldy with custom types. There's also a bit of "chicken and egg" for whether you want to infer the types from the schema or express the types and use them during creation of the schema.

The only way I found to make it work consistently was annotating the types of the function parameters (using my variant above as createTransform)

// ok
const IsoDate = createTransform(Type.String(), {
  encode: (date: Date) => date.toISOString(),
  // TODO
})

const IsoDate2 = createTransform(Type.String(), {
                    // typescript: 'date' is of type 'unknown'. [18046]
  encode: (date) => date.toISOString(),
  decode: (str) => new Date(str),
})

// oddly enough, annotating the INPUT of decode fixes the inference of `date`
const IsoDate3 = createTransform(Type.String(), {
  encode: (date) => date.toISOString(),
  decode: (str: string) => new Date(str),
})

// only way I found to explicitly specify `Date` before implementing `decode` without requiring `TString`
const IsoDate4 = createTransform(Type.String(), {
  // TODO
}) satisfies TTransform<any, Date>

myndzi commented 12 months ago

Oh hey, nice work! I'm pleased this got reconsidered :) I was just reading some release notes and hunting down the details of the decision and found this thread. It didn't feel worth opening a new thread, but I wanted to add a couple comments to the conversation.

I'd like to voice support for being able to push errors from encode/decode functions, something which was mentioned up-thread. Going back to the recurring Date example, there's a boundary problem: the wire format is "string", and we validate that it is a string before passing it to decode. However, decode may not succeed unless the string's contents are well-formed. The right thing to do currently seems like "throw an error in the encode/decode function", but it's unclear in the documentation what effect this has.

It would be nice to be able to integrate encode/decode failures into the Errors iterator, to put the error in context. In other words, it's not pleasant to choose between "obtain a list of all the structural problems" or "obtain one decoding error", I'd like them both 😂 - and ideally with the location in the data where the encoding/decoding error occurred.
I've dealt repeatedly with the exact Typescript problem you describe re: builder patterns. Type inference is much improved by doing things this way, since each function call is essentially a direct mapping from "before" to "after". Typescript does its best to fulfill inference needs when given more complex situations but it can only do so much and it can be very unpredictable where and how it breaks down. This was a good choice/compromise.

Anyway, the new version looks useful, and as it happens I was just "in the market" for a simpler use-case than my previous thread's, and came back to typebox to see if it would suit. Thanks for your efforts!

alexgorbatchev commented 8 months ago

This seems to work

      t
        .Transform(
          t
            .Transform(t.String())
            .Decode(value => new Date(value))
            .Encode(value => value.toISOString()),
        )
        .Decode(value => new Date(value))
        .Encode(value => value),

ehaynes99 commented 7 months ago

This seems to work

      t
        .Transform(
          t
            .Transform(t.String())
            .Decode(value => new Date(value))
            .Encode(value => value.toISOString()),
        )
        .Decode(value => new Date(value))
        .Encode(value => value),

I'm not sure what that's intending to do, but that's no more validation than the nested one. In the case of parsed strings, you can use the FormatRegistry to register a date-time format. That will validate the string value before attempting encoding. A simple version.

import { FormatRegistry, Type } from '@sinclair/typebox'
import { Value } from '@sinclair/typebox/value'

FormatRegistry.Set('date-time', (value: string) => {
  const pattern =
    /^\d\d\d\d-((0[1-9])|(1[0-2]))-((0[1-9])|([1-2]\d)|(3[01]))[tT](?:(([0-1]\d)|(2[0-3])):[0-5]\d:[0-5]\d|23:59:60)(?:\.\d+)?(?:[zZ]|[+-](([0-1]\d)|(2[0-3])):[0-5]\d)$/
  return pattern.test(value)
})

export const IsoDate = Type.Transform(Type.String({ format: 'date-time' }))
  .Decode((s) => new Date(s))
  .Encode((d) => d.toISOString())

console.log(Value.Check(IsoDate, '2023-01-26T12:34:56.789Z'))
// outputs: true

console.log(Value.Errors(IsoDate, 'abcdef').First())

// outputs:
// {
//   type: 50,
//   schema: {
//     format: 'date-time',
//     type: 'string',
//     [Symbol(TypeBox.Kind)]: 'String',
//     [Symbol(TypeBox.Transform)]: { Decode: [Function (anonymous)], Encode: [Function (anonymous)] }
//   },
//   path: '',
//   value: 'abcdef',
//   message: "Expected string to match 'date-time' format"
// }

There is a more robust version of the check in the examples, as a regex like above will still accept some nonsense values like "February 31st". https://github.com/sinclairzx81/typebox/blob/master/example/formats/date-time.ts

benevbright commented 7 months ago

hi @sinclairzx81 could you share updates about the prototype for fastify-type-provider-typebox? The branch seems gone.

ehaynes99 commented 7 months ago

@sinclairzx81 Any objections to this? https://github.com/fastify/fastify-type-provider-typebox/pull/127

sinclairzx81 commented 7 months ago

@ehaynes99 Hi,

No objections :) The TB provider project is managed by the Fastify team, and should be very open to all community contributions.

Just for some history on transforms, I did actually submit a PR to enable this a few months back to enable the StaticDecode/StaticEncode pipeline, but decided to wait (as I didn't want to deprecate the previous version range), but if this is being considered for your PR, it might be worth taking a look at the validation serialization compiler also (to automatically Decode (and Encode)) per request.

The PR for this can be found here:

https://github.com/fastify/fastify-type-provider-typebox/pull/99

The validation and serialization compiler code is here:

https://github.com/fastify/fastify-type-provider-typebox/pull/99/files#diff-adb065f7ea26f7f005649ad48bcbf0534bc860c701bb4e19c3917b125f4e2f20R1-R69

Unfortunately, I seem to have misplaced this branch while taking another look at a provider issue, so the code written will need to be sourced from the PR record. If you wanted to take a look at this also, you're more than welcome :) There's actually been some new updates in TB (namely the Clean and Default functions) that should being TB validation very close to Ajv validation.

Good work! S