tc39 / proposal-binary-ast

Binary AST proposal for ECMAScript
964 stars 23 forks source link

Why allow early errors to be serialized? #3

Open bmeck opened 7 years ago

bmeck commented 7 years ago

Curious as to why even allow early errors to become an AST?

syg commented 7 years ago

In practice, I expect that compilers are free to transform early error cases in text JS into a single throw, or just refuse to emit an AST. However, implementations must be able to handle malformed ASTs in case of malicious actors or buggy compilers. In that case, we feel requiring the entire tree be validated to be error-free would limit performance gains to be had. Thus, the current idea is to defer such errors until that part of the tree is looked at by the implementation.

bmeck commented 7 years ago

A runtime throw is different from an early error, for example when trying to load ESM a runtime throw allows partial graph evaluation while an early error does not. A tool could generate a simplified early error I guess, but that seems odd.

syg commented 7 years ago

Indeed, they are very different now. The proposal defers all early errors to "first invocation of function". In the proposal world, partial graph evaluation is always possible.

bmeck commented 7 years ago

Need to think on this as it means all early errors could change how ESM graphs load if they are transformed to this AST. It seems like a new execution mode more than simple AST transform at that point.

syg commented 7 years ago

Yes, this change in early error behavior is the most worrisome behavior change. Our guiding intuition was little (if any) code depends on having early errors. Is that misguided? Would love some examples if so.

syg commented 7 years ago

Also, this proposal is primarily motivated by implementations where cold load times matter, like the browser. I despise the phrase "normative optional", but it'd be nice to have the ecosystem keep the ability to ship only text source.

bmeck commented 7 years ago

I think it could be plausible that problems could arise from having partially evaluated dep graphs. It would be rather surprising to see that. I don't have any examples because no examples can run w/ early errors currently (this proposal would change that).

bmeurer commented 7 years ago

Since this is designed as strictly output format for tools, I don't think we need any early errors here. They can be reported by the tools and don't add any value if being deferred. It's a lot of extra complexity that can be saved by avoiding them completely (thus making it more likely that you get browser vendor buy-in).

Yoric commented 7 years ago

@bmeurer How do you propose we represent errors caused by invalid files?

bmeurer commented 7 years ago

Fail the build step.

bakkot commented 7 years ago

@bmeurer well-designed tools should fail the build step, of course, but you can imagine a buggy compiler which outputs a binary AST which represents a program with an early error. Implementations need to do something with such an input.

bmeurer commented 7 years ago

Right. Of course browsers cannot blindly trust arbitrary input. Invalid functions can just fail on first invocation.

kannanvijayan-zz commented 7 years ago

@bmeurer Yeah the expectation is that the build tools will emit correct code, and fail on bad code. The lazy error choice is purely to support fast safe parsing in the engine. Lazy errors and scope issues are the two main reasons parsers are forced to scan all bytes of an input program, and the main approach we're using to speed up parsing here is to skip parsing of code we don't execute.

We understand there are issues with semantic divergence, but the expectation is that the amount of non-testcase code out there that depends on early syntax errors for semantic correctness is almost nil.

bmeck commented 7 years ago

Nothing can depend on early errors in the wild today, this changes it so early errors are not early ; meaning, things can start to depend on being compiled so that early errors are deferred to runtime.

bakkot commented 7 years ago

Nothing can depend on early errors in the wild today,

Sure it can!

<html>
<head>
<script>
window.onerror = function(e){
  console.log('reached');
};
</script>
<script>
return; // oh no!
</script>
</head>
</html>
bmeck commented 7 years ago

I agree if we are talking about multiple sources that maybe be true. However, my point that this enables a mode in which early errors being deferred / not early is required is problematic.

bmeck commented 7 years ago

As an examples:

If you strip local binding names to be non-string based. Errors like eval/arguments being invalid identifiers can no longer occur.

If the AST doesn't support with, that early error cannot occur.

etc.

If an AST is malformed due to grammar instead of early errors, how does that differ in being ignored until runtime? How does the AST determine if something is malformed for things like unterminated lists of statements from overflowing into other lists? Is that even an error?

I think moving the early errors to be unable to be serialized seems sane. I am not ok with having code behavior of early errors change if it introduces places that cause evaluation of source text containing early errors which would not evaluate when decompiled into JS again.

kannanvijayan-zz commented 7 years ago

@bmeck That's an interesting point about keywords. We might want to enforce the same constraints as the text syntax for valid identifiers.

On with, binjs should definitely have it. We want a bijective mapping (up to the semantic equivalence guarantees) between a JS source file and a binjs file.

I think moving the early errors to be unable to be serialized seems sane. I am not ok with having code behavior of early errors change if it introduces places that cause evaluation of source text containing early errors which would not evaluate when decompiled into JS again.

This depends on the scope of the "source text" you're referring to. If a particular function definition contains an error directly, that function body effectively becomes equivalent to the statement throw new Exception(...). The containing function would still execute fine.

What do you think of that?

bakkot commented 7 years ago

with

Tangential, but, I am strongly opposed to allowing this format to represent sloppy mode scripts.

bmeck commented 7 years ago

@kannanvijayan as mentioned earlier changing an Early Error to throw breaks the semantics and invalidates your earlier desire:

We want a bijective mapping (up to the semantic equivalence guarantees) between a JS source file and a binjs file.

kannanvijayan-zz commented 7 years ago

@bmeck I don't agree that it does. The rough semantic equivalence I was going for was "if the code you load has no syntax errors, semantics are exactly equivalent. If it does have syntax errors (i.e. constraint violations, malformed encoded subtrees), then in binjs they are not raised, and the function directly containing the error is converted into an always-throw function".

That divergence seems troubling, but it is in fact really difficult to run into.

If you have a syntax error in your original source, the build tool will reject it. So to trigger this divergence, a developer would have to:

  1. Compile their no-syntax-error JS down to binjs.
  2. Manually go in and edit the binjs file to corrupt it.
  3. Skip any validation steps in the deploy process that would error out on the bad binjs file.
  4. Deploy.

And if a developer is trying that hard to get at the semantic divergence, then I don't see the harm in letting them have it. Clearly they have an idea of what they want to do.

The "bijective mapping" part is mainly to emphasize that the binjs file contains no more or less information than the source file it was converted from. To make the claim firm that we're not adding new syntactic forms so much as rearranging the existing information in a way that's faster for parsers to deal with.

bmeck commented 7 years ago

I would disagree on "semantics are exactly equivalent" if execution differs.

kannanvijayan-zz commented 7 years ago

@bakkot

Tangential, but, I am strongly opposed to allowing this format to represent sloppy mode scripts.

I would love it if we could do this, and maybe we can, but I'm not sure.

This whole initiative was borne out of practical concerns: sites are shipping megabytes of JS, parse times are becoming insane, and we need a good way to kill that bottleneck.

My intuitive guess is that most of the code on the web is non-strict, so if we don't support it then we're basically building a toy. I'm not sure though, I don't know how much code in the ecosystem these days is strict.

I love strict. You love strict. We all love strict. I'd love to restrict this to strict mode code. I don't know if that's a realistic possibility.

kannanvijayan-zz commented 7 years ago

@bmeck

I would disagree on "semantics are exactly equivalent" if execution differs.

Well, that particular claim was prefixed with a predicate :)

The desired developer assurance here is: "All your code will run exactly the same, unless you really go out of your way to dig at the semantic difference".

syg commented 7 years ago

We do have an opportunity here to perhaps not support sloppy mode, in that we can leverage build tooling to fail to compile existing sloppy code until they're fixed to be strict. That, however, worries me as it would hurt adoption. I envision this to be a very low-to-web-dev-cost "load faster" flag they pass to Babel or whatever. If it's a "load faster, right after you rewrite your stuff in strict mode" flag, it might not get traction.

bakkot commented 7 years ago

Babel as most people use it defaults to outputting strict mode code (including adding a "use strict" directive), so I'm less concerned than you are about the adoption risk to people already using build tools.

syg commented 7 years ago

@bakkot Ooh, interesting data point. Shows you how fake of a web dev I am.

kannanvijayan-zz commented 7 years ago

@bakkot that is very interesting indeed. Good to know.

vdjeric commented 7 years ago

I took a look at some sites, Facebook and LinkedIn only employ "use strict" within some functions (not all), and GMail doesn't "use strict" at all. So I think we'll have to support sloppy mode if want wide adoption on big sites

ljharb commented 7 years ago

… or evangelize strict mode to big sites.

vdjeric commented 7 years ago

And Facebook does use Babel for transforming syntax and minifying but we have too much legacy JS code that is hard to transform to strict mode, and apparently some issues in the transformation pipeline wrt strictness too. The end-result is that some JS functions are strict and others are not.

Becavalier commented 7 years ago

I am not really sure what the benefit is if we defer the early error throwing? Anyone could help explain this? What's the relationship between "pre-parse" and "backfire"? The backfire process can be eliminated by IIFE in v8 and the "pre-parse" will reduce the size of initializing ast, that's all what I know currently.

littledan commented 7 years ago

This is all about improving the startup performance, right? In other similar systems, such the JVM, validation is done upfront, leading to a significant source of overhead. The SpiderMonkey bug thread with the prototype implementation shows that delaying validation improves startup time significantly. Many websites ship a lot of dead code (e.g., Facebook).

robpalme commented 7 years ago

I detect violent agreement in this thread ;-)

Everyone agrees that the build tool must and will reject source code that contains early errors.

No one is suggesting that early errors be deferred.

Therefore any generated file that contains an early error is by definition corrupt, e.g. maliciously generated or a compiler bug.

The only debate is whether to force the runtime engine to reject corrupt files on load or later during evaluation. Given that it is an explicit goal to allow gradual/efficient parsing, we have to accept that corrupt files will result in errors sometime during evaluation. A consequence is that it becomes prudent to ensure error compatibility across implementations; something README.md explicitly addresses.

kannanvijayan-zz commented 7 years ago

@robpalme excellent synopsis :) Thank you.

Yoric commented 7 years ago

+1

So, perhaps we should use a different term. Maybe "corruption error", instead of "early errors".

kannanvijayan-zz commented 7 years ago

It might be worthwhile to refer to this in terms of well-formedness (AST is well formed and readable), and consistency (attribute grammar fields properly describe the code).

Yoric commented 7 years ago

Fine by me.