Open karwa opened 1 year ago
Related to #156 and #61
As for performance: we could use a bit-mask rather than an array of strings (since the standard only has a handful of unique validation errors). The cost is that we would lose some information, namely:
It would also be more limiting, as we would be unable to store associated information with each error (e.g. cursor position). But, in return, we would be able to store all errors that were encountered in a fixed amount of memory.
Another alternative would be for the parser to accept a validation error callback closure. Callers could supply a closure which collects all errors in an array, or which ignores them entirely, and the overhead in the parser should not be worse than any other virtual function call; the parser would just call the user-supplied function saying "hey, this error happened". Importantly, the caller would provide the memory to store errors, if they indeed want to store them.
I've implemented the closure approach on a branch (https://github.com/karwa/whatwg-url/commits/validation-error-callback) and hooked it up roughly to the live viewer.
The constructor for the URL
type is generated from IDL or something and I'm not sure how to go around it, so you'll need to forward that argument manually by editing the generated constructor in lib/URL.js
(line 115) and adding:
{
let curArg = arguments[2];
if (curArg !== undefined && typeof curArg === "function") {
args.push(curArg);
}
}
But yeah, it illustrates the approach. Since it's a callback function we can also add supplementary information about the error such as the pointer position, without having to worry about the memory overhead of storing all this stuff.
This is exciting work!
I think we only want to expose this capability through the low-level URL Standard API, not the new URL()
API. That should bypass any worries about conflicting with the generated code. It does mean the live viewer would need to switch which API it uses, but that should be fine.
Other things we'd need to do before merging this:
It may also be worth benchmarking the results of a no-op default onValidationError
, versus a null/undefined default onValidationError
that is always called with this.onValidationError.?(...)
.
I also like the onValidationError
approach because we could avoid the performance cost of #59 by doing something like
if (this.onValidationError && !isURLCodePoint(c) && c !== p("%")) {
this.onValidationError("invalid-URL-unit");
}
instead of just
if (!isURLCodePoint(c) && c !== p("%")) {
this.onValidationError.?("invalid-URL-unit");
}
There are a few other instances of that pattern as well that might be worth optimizing out, e.g.
if (c === p("%") &&
(!infra.isASCIIHex(this.input[this.pointer + 1]) ||
!infra.isASCIIHex(this.input[this.pointer + 2]))) {
this.onValidationError("invalid-URL-unit");
}
could remove collapse the three checks down into one.
From the table, I've only done the "URL parsing" section, not the IDNA or host parsing sections.
Having this in the reference implementation has already helped me confirm a couple of issues with the validation errors reported by the standard (bugs and PRs to
WHATWG/url
incoming). It can also be useful for other implementations to check the errors they report. Hopefully we can add expected validation errors to the WPT test suite in due course, as well.That said, these results are not exposed to the live viewer yet. I've hacked something together locally, but I'm not sure of the cleanest way to thread that through to the viewer.