Multiple nested errors - Githubissues

gristlabs / ts-interface-checker

Runtime library to validate data against TypeScript interfaces.

Apache License 2.0

323 stars 18 forks source link

Multiple nested errors #47

Closed alexmojaki closed 3 years ago

alexmojaki commented 3 years ago

This is an attempt to solve #3 and #16, allowing the result of validate to have multiple errors in the same nested array so that users can solve several problems in their input data at once.

At the moment this is a proof of concept, there's still quite a bit of work to do and decisions to be made, including whether to go forward with this at all.

Most types don't need to be able to report multiple errors so they haven't changed. Some types we could consider adding this to but I'm reluctant about:

TArray: if you get one item wrong, several others are probably wrong for the same reason.
TTuple and TParamList: if you forget an item before the end or get the order wrong, this will cause several errors with the same fix.
TUnion: it might be nice to show why a few candidates failed instead of just one best guess, but this is questionable.

So for now it's just TIface and TIntersection. TIface is a good example to show how this works.

First, this line is unchanged:

if (typeof value !== "object" || value === null) { return ctx.fail(null, "is not an object", 0); }

If it's not an object, there's nothing else to report, so we call ctx.fail directly and then return early. But from here on, fail must no longer be called directly on ctx, at least until this checker returns. Failures will instead be stored in forks. These could be renamed to branches, children, subcontexts, etc. They should be used as follows:

Get a forked context with ctx.fork()
Use the fork rather than ctx for (potential) failures, whether it's in a deeper checker as in propCheckers[i](v, fork) or directly as in fork.fail(name, null, 1).
After using the fork to check one 'thing' (e.g. a property or a base class), write:

          if (!ctx.completeFork()) {
            return false;
          }

The main context might make use of the information that you're done with the latest fork.
It then returns true if there's 'room' for more errors and you should continue checking, or false if you should just return now.
Once you're done checking, return !ctx.failed() to check whether any failures were gathered along the way without triggering an early return.

The NoopContext still does very little, only storing a boolean and returning itself from fork(). But the extra operations have some inevitable overhead. I haven't done proper calculations yet but at a glance the benchmark (great thing to have!) seems to run something like 10-15% slower.

The DetailContext keeps forks that have failures in them. The way it does this right now is not very clever and will probably be optimised. The number of failed forks it keeps is limited (right now to 3, but that will be made configurable) so that users don't get flooded with errors. Once it reaches that limit, ctx.completeFork() returns false to indicate you should exit. But since you can now have a whole tree of contexts, the total number of forks and errors is unlimited.

Some things that still need addressing:

Formatting a nice hierarchical multiline VError message when necessary
A proper API - I have many thoughts on this topic
The issue I mentioned in https://github.com/gristlabs/ts-interface-checker/pull/4#issuecomment-808807995

What are your thoughts so far?

alexmojaki commented 3 years ago

I think the API needs some redesigning. The names test, check, and validate are easy to confuse, there's a strict copy of each one, and it's hard to add more options without creating a combinatorial explosion of methods. In particular I want to add options for https://github.com/gristlabs/ts-interface-checker/pull/4#issuecomment-808807995 and https://github.com/alexmojaki/ts-interface-checker/pull/1, and there may be more in the future such as an option to deal with #46 or #38.

I propose having one central method which users are encouraged to use where they can configure everything, something like this:

export class Checker {

    // This is the central method, but we could create a new method with a new name instead
    public validate(value: any, options: CheckOptions): IErrorDetail | null {
       ...
    }

    public check(value: any, options: CheckOptions): void {
        this.validate(value, {...options, onError: OnError.throw});
    }

    public test(value: any, options: CheckOptions): boolean {
        return this.validate(
            value,
            {...options, onError: OnError.return, numErrors: 0}
        ) == null;
    }

    ...
}

interface CheckOptions {
    extras: Extras
    onError: OnError

    /**
     * Number of errors that can be nested directly
     * under another error.
     *
     * Internally, the number of forks a DetailContext can hold.
     *
     * 0: Don't record anything.
     *   - Used for test() to just return a boolean
     *   - Could be used in `check()` if someone wants to raise an error
     *      but doesn't want a message.
     *   - Creates a NoopContext
     * 1: Old default behaviour for check/validate
     * > 1: Can nest several errors
     *   - May lead to a multiline error message when using OnError.throw
     *   - Requires checkRoot to be true
     */
    numErrors: number

    /**
     * If true, wraps the top level error in another layer about the
     * root type being checked. For example, this would change
     * the error message from:
     *
     *     value.extras is missing
     *
     * to:
     *
     *     value is not a CheckOptions; value.extras is missing
     *
     * Similarly the returned IErrorDetail will have an extra top layer.
     *
     * This is required if numErrors > 1 so that multiple errors can be
     * nested under the root IErrorDetail.
     *
     * Internally this wraps the type being checked in a TName first.
     */
    checkRoot: boolean
}

/** What to do when extraneous values are found */
enum Extras {
    /** Allow extra values */
    ignore,

    /** Formerly strict mode: do what onError says */
    error,

    /** Remove extra values in place */
    delete,
}

/** What to do when an error is encountered */
enum OnError {
    /** Throw a VError */
    throw,

    /** Return an object describing the errors */
    return,
}

Questions:

Do you like this idea?
What should the default values for options be? Keep behaviour the same? Default numErrors to 3? Require specifying certain options?
Deprecate some old methods? I suggest deprecating the strict methods, but leaving check and test as is for the clean return types.
How to deal with conflicting options, e.g. numErrors: 1, checkRoot: true, or check(value, {onError: OnError.throw})? Just overwrite them, or throw an error?

alexmojaki commented 3 years ago

Agreed that an options object is more flexible.

I like adding an option for extraProps to make "strict" versions redundant, and to add the option to delete extra properties. So perhaps all methods can take options with {extraProps?: 'error' | 'delete'}, by default omitted.

So if I understand correctly:

options will only have one key extraProps and none of the others mentioned, so its purpose (as opposed to an extraProps argument) will be to make adding more options in the future easier.
Both options itself and extraProps are optional.

By the way, I called it extras because in general it also deals with extra items in tuples and parameter lists and I didn't know if those are considered 'properties', but I'm happy with extraProps since interface properties are what people usually have in mind anyway.

For the distinction between validate / check / test, I think using separate methods is a bit better than a single method with different return types -- in particular for type-checking, and explaining what it does. As long as these methods exist, I don't see much purpose in emulating them by passing options to some unified method.

Last but not least, for the level of details in errors, I think more informative errors are just better. The only reason against making them the only behavior is if there is existing code that relies on the previous format. I am thinking, if we bump the major version and explain the difference, it would be fine to just upgrade the errors to contain more details, and avoid burdening the user with more options.

Right, the idea of a central method with lots of options was that the user would have one clear place to look to discover all their needs, but if we just reduce the number of options there's less need for that and I like your thinking.

So shall the number of errors per object just be fixed at 3?

dsagal commented 3 years ago

Agreed that an options object is more flexible. I like adding an option for extraProps to make "strict" versions redundant, and to add the option to delete extra properties. So perhaps all methods can take options with {extraProps?: 'error' | 'delete'}, by default omitted.

So if I understand correctly:
1. `options` will only have one key `extraProps` and none of the others mentioned, so its purpose (as opposed to an `extraProps` argument) will be to make adding more options in the future easier.

2. Both `options` itself and `extraProps` are optional.

Right. It reduces the number of methods to keep in mind in half, and alos allows adding the "delete" interface easily.

By the way, I called it extras because in general it also deals with extra items in tuples and parameter lists and I didn't know if those are considered 'properties', but I'm happy with extraProps since interface properties are what people usually have in mind anyway.

Yeah. Just extras made me think these options might be for extra functionality or extra checks, and wander what it might mean. I think extraProps sounds reasonable for tuples too. Or maybe extraneous if you like that better.

For the distinction between validate / check / test, I think using separate methods is a bit better than a single method with different return types -- in particular for type-checking, and explaining what it does. As long as these methods exist, I don't see much purpose in emulating them by passing options to some unified method. Last but not least, for the level of details in errors, I think more informative errors are just better. The only reason against making them the only behavior is if there is existing code that relies on the previous format. I am thinking, if we bump the major version and explain the difference, it would be fine to just upgrade the errors to contain more details, and avoid burdening the user with more options.

Right, the idea of a central method with lots of options was that the user would have one clear place to look to discover all their needs, but if we just reduce the number of options there's less need for that and I like your thinking.

So shall the number of errors per object just be fixed at 3?

I think it's a good choice. I don't really see myself changing it, but if anyone cares about it, it could be added as an option.

alexmojaki commented 3 years ago

Now it's ready!

Some notes:

validate() now returns an array. My previous idea of adding an extra check at the root by checking a TName doesn't work if you want to use a Checker returned by the methods such as getProp where there might not be a name.
I've made the error message as flat as possible to emulate the old behaviour and reduce newlines. For example in the new tests you can see this error:

value.ab is not a AB value.ab is not a A; value.ab.a is missing value.ab is not a B; value.ab.b is missing

If error messages were consistently tree-like, it would look like this:

value.ab is not a AB value.ab is not a A value.ab.a is missing value.ab is not a B; value.ab.b is missing

The comments on errorLines explain in more detail.
What should happen once you're happy with this? https://github.com/alexmojaki/ts-interface-checker/pull/1 comes next but both PRs introduce breaking changes. Leave this unmerged until https://github.com/alexmojaki/ts-interface-checker/pull/1 is merge ready? Merge this into master but don't release?

alexmojaki commented 3 years ago

I've done a detailed benchmark analysis. Seems like it's now 16% slower.

Disclaimer: I'm not a statistician.

Click for full analysis

```python import re import statistics output = """ $ npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && git checkout master && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench && npm run bench > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,191,607 ops/sec ±0.20% (97 runs sampled) protobuf verify x 8,823,535 ops/sec ±0.29% (96 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,189,434 ops/sec ±0.17% (98 runs sampled) protobuf verify x 8,702,189 ops/sec ±0.21% (98 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,138,205 ops/sec ±0.19% (98 runs sampled) protobuf verify x 8,205,776 ops/sec ±2.26% (93 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,147,411 ops/sec ±0.51% (97 runs sampled) protobuf verify x 8,533,095 ops/sec ±0.52% (95 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,047,723 ops/sec ±0.18% (95 runs sampled) protobuf verify x 7,943,914 ops/sec ±2.28% (90 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,080,907 ops/sec ±0.21% (96 runs sampled) protobuf verify x 8,196,217 ops/sec ±0.18% (96 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 996,332 ops/sec ±0.20% (92 runs sampled) protobuf verify x 7,708,242 ops/sec ±0.31% (91 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,056,713 ops/sec ±0.16% (93 runs sampled) protobuf verify x 7,847,602 ops/sec ±0.43% (96 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,101,959 ops/sec ±0.17% (95 runs sampled) protobuf verify x 8,127,860 ops/sec ±0.26% (96 runs sampled) Switched to branch 'master' Your branch is up to date with 'origin/master'. > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,216,527 ops/sec ±0.24% (97 runs sampled) protobuf verify x 7,780,665 ops/sec ±0.18% (94 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,297,354 ops/sec ±0.22% (95 runs sampled) protobuf verify x 8,057,131 ops/sec ±0.17% (97 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,174,733 ops/sec ±0.29% (95 runs sampled) protobuf verify x 7,451,642 ops/sec ±0.25% (96 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,246,502 ops/sec ±0.44% (88 runs sampled) protobuf verify x 7,660,006 ops/sec ±0.28% (93 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,236,564 ops/sec ±0.46% (93 runs sampled) protobuf verify x 7,919,159 ops/sec ±0.20% (94 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,269,222 ops/sec ±0.17% (92 runs sampled) protobuf verify x 7,855,862 ops/sec ±0.29% (97 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,260,112 ops/sec ±0.17% (96 runs sampled) protobuf verify x 7,858,326 ops/sec ±0.18% (94 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,195,676 ops/sec ±0.22% (97 runs sampled) protobuf verify x 7,504,485 ops/sec ±0.20% (94 runs sampled) > ts-interface-checker@0.2.1 bench > tsc && node test/bench/bench.js ts-interface-checker x 1,178,706 ops/sec ±0.23% (97 runs sampled) protobuf verify x 7,514,983 ops/sec ±0.90% (93 runs sampled) """ numbers = [int(n.replace(",", "")) for n in re.findall(r"[\d,]{7,}", output)] half = len(numbers) // 2 # 'before' was printed second, after git checkout master before = numbers[half:] after = numbers[:half] assert len(before) == len(after) assert len(before) % 2 == 0 def average_ratio(nums): ratios = [] for i in range(0, len(nums), 2): tsi, proto = nums[i:i + 2] ratio = tsi / proto ratios.append(ratio) print(ratios) # After: # [0.13504870780248507, 0.13668216123552362, 0.13870778339550094, 0.13446598215536099, 0.1318900229785972, 0.13187876797307832, 0.1292554125830507, 0.13465425489213137, 0.13557799962105646] # Before: # [0.15635257397664595, 0.16101935043627813, 0.1576475359390588, 0.16272859316298185, 0.15614839909136816, 0.1615636832724404, 0.1603537445506842, 0.1593281884099975, 0.15684746059971127] # Higher ratios are better return statistics.geometric_mean(ratios) print(average_ratio(after) / average_ratio(before)) # 0.8436143737304612 # i.e. 84.3% as fast as before ```