Open bmeck opened 7 years ago
In practice, I expect that compilers are free to transform early error cases in text JS into a single throw, or just refuse to emit an AST. However, implementations must be able to handle malformed ASTs in case of malicious actors or buggy compilers. In that case, we feel requiring the entire tree be validated to be error-free would limit performance gains to be had. Thus, the current idea is to defer such errors until that part of the tree is looked at by the implementation.
A runtime throw is different from an early error, for example when trying to load ESM a runtime throw
allows partial graph evaluation while an early error does not. A tool could generate a simplified early error I guess, but that seems odd.
Indeed, they are very different now. The proposal defers all early errors to "first invocation of function". In the proposal world, partial graph evaluation is always possible.
Need to think on this as it means all early errors could change how ESM graphs load if they are transformed to this AST. It seems like a new execution mode more than simple AST transform at that point.
Yes, this change in early error behavior is the most worrisome behavior change. Our guiding intuition was little (if any) code depends on having early errors. Is that misguided? Would love some examples if so.
Also, this proposal is primarily motivated by implementations where cold load times matter, like the browser. I despise the phrase "normative optional", but it'd be nice to have the ecosystem keep the ability to ship only text source.
I think it could be plausible that problems could arise from having partially evaluated dep graphs. It would be rather surprising to see that. I don't have any examples because no examples can run w/ early errors currently (this proposal would change that).
Since this is designed as strictly output format for tools, I don't think we need any early errors here. They can be reported by the tools and don't add any value if being deferred. It's a lot of extra complexity that can be saved by avoiding them completely (thus making it more likely that you get browser vendor buy-in).
@bmeurer How do you propose we represent errors caused by invalid files?
Fail the build step.
@bmeurer well-designed tools should fail the build step, of course, but you can imagine a buggy compiler which outputs a binary AST which represents a program with an early error. Implementations need to do something with such an input.
Right. Of course browsers cannot blindly trust arbitrary input. Invalid functions can just fail on first invocation.
@bmeurer Yeah the expectation is that the build tools will emit correct code, and fail on bad code. The lazy error choice is purely to support fast safe parsing in the engine. Lazy errors and scope issues are the two main reasons parsers are forced to scan all bytes of an input program, and the main approach we're using to speed up parsing here is to skip parsing of code we don't execute.
We understand there are issues with semantic divergence, but the expectation is that the amount of non-testcase code out there that depends on early syntax errors for semantic correctness is almost nil.
Nothing can depend on early errors in the wild today, this changes it so early errors are not early ; meaning, things can start to depend on being compiled so that early errors are deferred to runtime.
Nothing can depend on early errors in the wild today,
Sure it can!
<html>
<head>
<script>
window.onerror = function(e){
console.log('reached');
};
</script>
<script>
return; // oh no!
</script>
</head>
</html>
I agree if we are talking about multiple sources that maybe be true. However, my point that this enables a mode in which early errors being deferred / not early is required is problematic.
As an examples:
If you strip local binding names to be non-string based. Errors like eval
/arguments
being invalid identifiers can no longer occur.
If the AST doesn't support with
, that early error cannot occur.
etc.
If an AST is malformed due to grammar instead of early errors, how does that differ in being ignored until runtime? How does the AST determine if something is malformed for things like unterminated lists of statements from overflowing into other lists? Is that even an error?
I think moving the early errors to be unable to be serialized seems sane. I am not ok with having code behavior of early errors change if it introduces places that cause evaluation of source text containing early errors which would not evaluate when decompiled into JS again.
@bmeck That's an interesting point about keywords. We might want to enforce the same constraints as the text syntax for valid identifiers.
On with
, binjs should definitely have it. We want a bijective mapping (up to the semantic equivalence guarantees) between a JS source file and a binjs file.
I think moving the early errors to be unable to be serialized seems sane. I am not ok with having code behavior of early errors change if it introduces places that cause evaluation of source text containing early errors which would not evaluate when decompiled into JS again.
This depends on the scope of the "source text" you're referring to. If a particular function definition contains an error directly, that function body effectively becomes equivalent to the statement throw new Exception(...)
. The containing function would still execute fine.
What do you think of that?
with
Tangential, but, I am strongly opposed to allowing this format to represent sloppy mode scripts.
@kannanvijayan as mentioned earlier changing an Early Error to throw
breaks the semantics and invalidates your earlier desire:
We want a bijective mapping (up to the semantic equivalence guarantees) between a JS source file and a binjs file.
@bmeck I don't agree that it does. The rough semantic equivalence I was going for was "if the code you load has no syntax errors, semantics are exactly equivalent. If it does have syntax errors (i.e. constraint violations, malformed encoded subtrees), then in binjs they are not raised, and the function directly containing the error is converted into an always-throw function".
That divergence seems troubling, but it is in fact really difficult to run into.
If you have a syntax error in your original source, the build tool will reject it. So to trigger this divergence, a developer would have to:
And if a developer is trying that hard to get at the semantic divergence, then I don't see the harm in letting them have it. Clearly they have an idea of what they want to do.
The "bijective mapping" part is mainly to emphasize that the binjs file contains no more or less information than the source file it was converted from. To make the claim firm that we're not adding new syntactic forms so much as rearranging the existing information in a way that's faster for parsers to deal with.
I would disagree on "semantics are exactly equivalent" if execution differs.
@bakkot
Tangential, but, I am strongly opposed to allowing this format to represent sloppy mode scripts.
I would love it if we could do this, and maybe we can, but I'm not sure.
This whole initiative was borne out of practical concerns: sites are shipping megabytes of JS, parse times are becoming insane, and we need a good way to kill that bottleneck.
My intuitive guess is that most of the code on the web is non-strict, so if we don't support it then we're basically building a toy. I'm not sure though, I don't know how much code in the ecosystem these days is strict.
I love strict. You love strict. We all love strict. I'd love to restrict this to strict mode code. I don't know if that's a realistic possibility.
@bmeck
I would disagree on "semantics are exactly equivalent" if execution differs.
Well, that particular claim was prefixed with a predicate :)
The desired developer assurance here is: "All your code will run exactly the same, unless you really go out of your way to dig at the semantic difference".
We do have an opportunity here to perhaps not support sloppy mode, in that we can leverage build tooling to fail to compile existing sloppy code until they're fixed to be strict. That, however, worries me as it would hurt adoption. I envision this to be a very low-to-web-dev-cost "load faster" flag they pass to Babel or whatever. If it's a "load faster, right after you rewrite your stuff in strict mode" flag, it might not get traction.
Babel as most people use it defaults to outputting strict mode code (including adding a "use strict"
directive), so I'm less concerned than you are about the adoption risk to people already using build tools.
@bakkot Ooh, interesting data point. Shows you how fake of a web dev I am.
@bakkot that is very interesting indeed. Good to know.
I took a look at some sites, Facebook and LinkedIn only employ "use strict" within some functions (not all), and GMail doesn't "use strict" at all. So I think we'll have to support sloppy mode if want wide adoption on big sites
… or evangelize strict mode to big sites.
And Facebook does use Babel for transforming syntax and minifying but we have too much legacy JS code that is hard to transform to strict mode, and apparently some issues in the transformation pipeline wrt strictness too. The end-result is that some JS functions are strict and others are not.
I am not really sure what the benefit is if we defer the early error throwing? Anyone could help explain this? What's the relationship between "pre-parse" and "backfire"? The backfire process can be eliminated by IIFE in v8 and the "pre-parse" will reduce the size of initializing ast, that's all what I know currently.
This is all about improving the startup performance, right? In other similar systems, such the JVM, validation is done upfront, leading to a significant source of overhead. The SpiderMonkey bug thread with the prototype implementation shows that delaying validation improves startup time significantly. Many websites ship a lot of dead code (e.g., Facebook).
I detect violent agreement in this thread ;-)
Everyone agrees that the build tool must and will reject source code that contains early errors.
No one is suggesting that early errors be deferred.
Therefore any generated file that contains an early error is by definition corrupt, e.g. maliciously generated or a compiler bug.
The only debate is whether to force the runtime engine to reject corrupt files on load or later during evaluation. Given that it is an explicit goal to allow gradual/efficient parsing, we have to accept that corrupt files will result in errors sometime during evaluation. A consequence is that it becomes prudent to ensure error compatibility across implementations; something README.md explicitly addresses.
@robpalme excellent synopsis :) Thank you.
+1
So, perhaps we should use a different term. Maybe "corruption error", instead of "early errors".
It might be worthwhile to refer to this in terms of well-formedness (AST is well formed and readable), and consistency (attribute grammar fields properly describe the code).
Fine by me.
Curious as to why even allow early errors to become an AST?