sinclairzx81 / typebox

Json Schema Type Builder with Static Type Resolution for TypeScript
Other
4.78k stars 152 forks source link

Docs should advise using `Check()` before calling `Errors()` to mitigate DOS attacks #328

Closed jtlapp closed 1 year ago

jtlapp commented 1 year ago

Since I wanted to report validation errors, my initial inclinations was just to call Errors() to get the errors and only perform error processing if the list were non-empty. But it's better to call Check() first for performance reasons, especially to mitigate the impact of DOS attacks.

It's not apparent from reading the docs, but Check() short-circuits on the first failed condition, while Errors() checks all conditions. It's particularly important to know this about regex matching. Consider the following:

import { Type } from '@sinclair/typebox';
import { TypeCompiler } from '@sinclair/typebox/compiler';

const checker = TypeCompiler.Compile(
  Type.Object({
    text: Type.RegEx(/a-zA-Z/, { maxLength: 4 }),
  })
);

console.log('CODE:\n' + checker.Code());
console.log('\nRESULT:', [...checker.Errors({ text: '12345' })]);
CODE:
const local_0 = /a-zA-Z/;
return function check(value) {
  return (
    (typeof value === 'object' && value !== null && !Array.isArray(value)) &&
    (typeof value.text === 'string') &&
    (value.text.length <= 4) &&
    (local_0.test(value.text))
 )
}

RESULT: [
  {
    type: 36,
    schema: {
      maxLength: 4,
      type: 'string',
      pattern: 'a-zA-Z',
      [Symbol(TypeBox.Kind)]: 'String'
    },
    path: '/text',
    value: '12345',
    message: 'Expected string length less or equal to 4'
  },
  {
    type: 37,
    schema: {
      maxLength: 4,
      type: 'string',
      pattern: 'a-zA-Z',
      [Symbol(TypeBox.Kind)]: 'String'
    },
    path: '/text',
    value: '12345',
    message: 'Expected string to match pattern a-zA-Z'
  }
]

The check() function tests string length before checking the regex pattern and short-circuits, as we'd expect a good validator to do. The Errors() function always checks both string length and regex and is therefore not suitable as a first line of defense.

Of course, once you understand that Errors() reports all inconsistencies, it's apparent that Check() ought to come before Errors() -- but this fact doesn't appear in the docs either.

So I'm thinking that users could benefit from a little more information that helps with creating more robust code.

jtlapp commented 1 year ago

Thinking about this some more, it seems that without short-circuiting, I can't return errors to untrusted clients. In order to be able to return errors, I would have to either match regexes against any length string the client provides or else not use the regex feature of TypeBox and do my regex testing manually after TypeBox validation.

jtlapp commented 1 year ago

On further thought, the most robust solution would only produce error messages for authenticated users. But I still think it's important to know that Errors() will not short-circuit on invalid maxLength.

sinclairzx81 commented 1 year ago

@jtlapp Hi!

TypeBox splits value checking and diagnostics (errors) into separate modules as integrators may need to implement unique and specific error handling / assertions in their applications (and where in many cases, the diagnostic check is not required at all), In your case it sounds like C.Errors(...) may only be invoked for Authenticated Users....which should be possible.

Preventing Non-Exhaustive Diagnostics

The C.Errors(...) function returns an iterator which is used to yield through each validation check. If exhaustive errors are not required (for example you're only interested in computing up to the first error), you can use .next() on the iterator to only yield as far as the first failing case (allowing the implementer to short circut diagnostics externally)

If C.Check() returns false, you are guaranteed to have at least 1 error yielded on C.Errors(). The library does implement extensive unit tests to ensure for this.

// -------------------------------------------------------
// User Defined Assert (first error only)
// -------------------------------------------------------

function Assert<T extends TSchema>(schema: T) {
  const check = TypeCompiler.Compile(schema)
  return (value: unknown) => {
    if(check.Check(value)) return
    const first = check.Errors(value).next()!.value // first error only
    throw new Error(first.message)
  }
}

// -------------------------------------------------------
// Test
// -------------------------------------------------------

const Vector = Type.Object({
  x: Type.Number(),
  y: Type.Number()
})

const VectorAssert = Assert(Vector)

VectorAssert('not a vector') // throws: Expected Object

For all errors you can use.

const exhaustiveDiagnotics = [...C.Errors(value)]

Happy to include a little bit of information in the documentation around this.

DOS mitigation

TypeBox doesn't actually implement any RegEx by default (with an exception made for property key tests for Record<string | number, ...>. Regex, DOS has been a consideration with respect to keeping regular expressions from the library, and I believe AJV does something similar with ajv-formats. In this respect, regular expressions are very much pushed back on the implementer to specify their expressions and ensure they are optimal.

TypeBox does however provide a reference port of the ajv-formats module that can be used to configure the TypeCompiler string formats for common patterns like email, uuid, uri, etc. You can find this https://github.com/sinclairzx81/typebox/blob/master/example/formats/additional.ts with the expressions themselves generally considered to be DOS safe (you will need to copy and paste this into your project at this stage)

import { AdditionalFormats } from './path/to/additional'

AdditionalFormats.Configure()

const T = Type.String({ format: 'email' })

Value.Check(T, 'dave@domain.com') // ok

Hope this helps! S

jtlapp commented 1 year ago

Okay, thank you. It's clear that if I only wanted to return a single error for a single field, then this approach suffices. I was hoping to short-circuit on a per-field basis, rather than per-object, but I could make do with this on the server side. It's really only client-side pre-validation of form data that needs to report multiple errors, and the DOS issue doesn't exist there. I think this solution will work for me.

Also thanks for the reference to the safe regexes. I think anything with security implications needs to be centrally managed and updated, though, so I would probably look for exports of these regexes from a well-maintained library.

Feel free to close this issue if you don't want to keep it open to remind you of any doc mods.

Thanks so much for your super-prompt help!

jtlapp commented 1 year ago

I think all the docs need to say is:

(1) Errors() performs one validation check per iteration; it does not pre-determine the errors. (2) In both Check() and Errors(), maxLength is checked before checking a regex pattern.

jtlapp commented 1 year ago

Also, I'm curious, in your Assert() function, why does it take a schema instead of a compiled schema? Are you caching compilations? I assumed that the point of compiling was to be able to do the work once and never again.

And, why do you Pascal case your functions? This is at odds with established JavaScript convention. It also annoys me, because my brain is trained to see Pascal-cased identifiers as types. Golang Pascal-cases public identifiers, but scanning the code, I notice that you're doing it for private ones too.

I do see some sense in Pascal-casing the functions that define schema types, specifically because they correspond to types, and doing so makes the schema look like a type definition.

sinclairzx81 commented 1 year ago

@jtlapp Hey, no problem :)

Also thanks for the reference to the safe regexes. I think anything with security implications needs to be centrally managed and updated, though, so I would probably look for exports of these regexes from a well-maintained library.

I totally agree! I think at this stage the ajv-formats plugin library is probably the best point of call (which is where those reference expressions were lifted from), but not sure if there exists another defacto / vetted repository which maintains common patterns that are decoupled from Ajv (but I think they are the highest quality I've seen)

A project that provides defacto expressions does sound like a good community endeavor...not much fun to maintain tho! :D

I think all the docs need to say is: (1) Errors() performs one validation check per iteration; it does not pre-determine the errors. (2) In both Check() and Errors(), maxLength is checked before checking a regex pattern.

Alright, will look at including something along these lines on the next release cycle (likely sometime later this week). Also good spotting on the length before regex assertion order (that's an eye for detail). Will keep you updated on this thread once the documentation has been amended.

Also, I'm curious, in your Assert() function, why does it take a schema instead of a compiled schema? Are you caching compilations? I assumed that the point of compiling was to be able to do the work once and never again.

TypeBox doesn't cache compilation but it does expect the user to. The general pattern is to wrap the compiled routine TypeCheck<T> in a closure, and return a function to the user, so as follows.

function CompileAssert<T extends TSchema>(schema: T) {
   const compiled = TypeCompiler.Compile(schema) // cached as a closure
   return (value: unknown) => { // return function to caller
      if(complied.Check(value)) return
      throw Error('invalid')
   }
}

const C = CompileAssert(Type.String()) // hold onto C
C(12345) // throws 'invalid'

Other patterns include storing the compiled routine as a class member (which is my preference when using TB). See here for runtime safe implementations of Map, Set and Array which store the compiled routine as a private class member used throughout the class lifetime.

And, why do you Pascal case your functions? This is at odds with established JavaScript convention. It also annoys me, because my brain is trained to see Pascal-cased identifiers as types. Golang Pascal-cases public identifiers, but scanning the code, I notice that you're doing it for private ones too.

Originally it was to draw parity with Number, String, Boolean, Object and Array JavaScript primitives; but also because types are usually named with Pascal (as are generic arguments like the T in <T extends Foo>). It wasn't really to be at odds with JS conventions; but there was some motivation to make new conventions by drawing a distinction between JS values representing values, and JS values representing types. I felt Pascal casing made for good visual distinction in this regard.

As for the rest of the library (which came along much much later), I mostly adopted the Pascal convention for naming consistency; but also with a view that users would probably end up wrapping much of the TB compiler and value checking infrastructure in their projects anyway. In this respect, Pascal is partly an incentive to the user that they should wrap assertion logic in more more idiomatic JS friendly conventions (with TypeBox remaining focused on providing relatively lower level fast assertions without getting too bogged down in higher level abstractions). That was loosely the thinking anyway...

For most other projects I manage, I generally opt inline with common JS convention (but am quite partial to Pascal casing) :)

Hope that answers some of your questions! :) I think for now ill close up the issue, but will report back on this thread once the documentation has been amended as per request.

All the best! S

jtlapp commented 1 year ago

The general pattern is to wrap the compiled routine TypeCheck in a closure, and return a function to the user, so as follows.

Oh I see. Gosh, sometimes I think it would help for me to learn an FP-only language so I'd think more that way. Thanks for the detailed explanation.

This library is awesome. I'm surprised it took so long for someone to take this approach.

jtlapp commented 1 year ago

It would be nice if there were a JS library that tracked the OWASP recommended regexes.

sinclairzx81 commented 1 year ago

@jtlapp Hiya, just amended the docs to suggest Check() should be called before Errors() with a bit more insight into what Errors() function is doing under the hood.

Documentation update can be found under the TypeCompiler section, with the copy reading as follows.

Use Errors(...) to generate diagnostics for a value. The Errors(...) function will run an exhaustive check across the value and yield any error found. For performance, this function should only be called after failed Check(...).

At some point in the future, I'd actually like to document much of the internals of the library (including a section on ReDOS) on an actual documentation site (essentially taking what's in the readme now, moving it to the website and trimming back some of the copy in the readme). Hopefully what's there now should provide enough insight into the library for effective use though.

Thanks again for the suggestion! Cheers S