tc39 / proposal-binary-ast

Binary AST proposal for ECMAScript
965 stars 23 forks source link

AssertedVarScope needs the actual kind of each name #40

Closed arai-a closed 6 years ago

arai-a commented 6 years ago

Currently AssertedVarScope has the following structure:

interface AssertedVarScope {
  // checked eagerly during transformation
  attribute FrozenArray<IdentifierName> lexicallyDeclaredNames;
  attribute FrozenArray<IdentifierName> varDeclaredNames;

  // checked lazily as inner functions are invoked
  attribute FrozenArray<IdentifierName> capturedNames;
  attribute boolean hasDirectEval;
};

So at least it doesn't distinguish between let and const.

This is troublesome for streaming compilation because the type of each name is unknown until we hit VariableDeclaration, which can appear later in the statement list. (there can be many another statements before it), and we cannot create efficient representation of the scope until hitting all of them.

Also, it would be nice if it also distinguish between var and function, which might be too-SpiderMonkey specific tho.

arai-a commented 6 years ago

AssertedBlockScope also needs to distinguish between let and const:

interface AssertedBlockScope {
  // checked eagerly during transformation
  attribute FrozenArray<IdentifierName> lexicallyDeclaredNames;

  // checked lazily as inner functions are invoked
  attribute FrozenArray<IdentifierName> capturedNames;
  attribute boolean hasDirectEval;
};
Yoric commented 6 years ago

(first post on microsoft github, odd)

Yeah, that would make sense. We could have an

interface ScopeNames {
  attribute FrozenArray<IdentifierName> declaredNames;
  attribute FrozenArray<IdentifierName> capturedNames;
}

and then rewrite

interface AssertedBlockScope {
   attribute ScopeNames? letNames;
   attribute ScopeNames? constNames; 
   attribute boolean hasDirectEval;
}

etc.

arai-a commented 6 years ago

also, adding class separately from let might be nice. SpiderMonkey currently doesn't distinguish between them, but adding class will help showing better error message for redeclaration across 2 files or from eval.

syg commented 6 years ago

The intention of the AssertedScopes is to have the minimal amount of information needed to enable computing binding locations (i.e. frame or environment slots) and accesses to those binding locations ahead of time.

Distinguishing non-const from const is strictly not needed, because any access to the const before the actual declaration is encountered is a TDZ error, and thus do not need to distinguish const from non-const. That said, non-const vs const seems useful to know ahead of time since an implementation could reasonably choose to put consts elsewhere than other lexical bindings.

Also, I'll probably rename let names to nonConstLexical or something, in that there are other lexical bindings that aren't let forms.

Distinguishing classes and top-level functions are harder to justify. For better error messages with classes in SM, you can add a new DeclarationKind for classes.

arai-a commented 6 years ago

The intention of the AssertedScopes is to have the minimal amount of information needed to enable computing binding locations (i.e. frame or environment slots) and accesses to those binding locations ahead of time.

I'm not sure how other implementations handle slot and declaration kind tho, for SpiderMonkey, slot offset implies the declaration kind (including let/const/var/function), so simply using the index in the AssertdScope's array doesn't work, but we'll need to reorder bindings based on declaration kind, either at the beginning of the compilation (which is not possible without lookahead), or gradually create the scope object while compiling the file, and fixup each bytecode's slot data once the actual slot offset is known (which could be done, but costs).

For better error messages with classes in SM, you can add a new DeclarationKind for classes.

it's already added, and works when we hit redeclaration in the single compilation unit. https://searchfox.org/mozilla-central/rev/292d295d6b084b43b70de26a42e68513bb7b36a3/js/src/frontend/NameAnalysisTypes.h#84

what I intended is the case that redeclaration happens across 2 compilation unit. (https://bugzilla.mozilla.org/show_bug.cgi?id=1428672 for non-binjs case, which I guess is going to add class range to the offset) what we have at that point is only Scope object (of course, this restriction would also be too SpiderMonkey-specific)

syg commented 6 years ago

I'm not sure how other implementations handle slot and declaration kind tho, for SpiderMonkey, slot offset implies the declaration kind

Right, I originally designed it that way as a packed representation. At the cost of memory, you can imagine a scope having an array of pairs of (name, kind). The only real constraints at the time when I originally designed it was that all lexical bindings should be in a contiguous range, so that we can TDZ an entire range instead of one slot at a time.

I agree fixing up would be really gross. I'm fine with adding the let/const distinction since it's pretty fundamental to the language. Other additions you want to add to Scope::Data would need to be fixed up somehow. Perhaps we can come up with a better packed Scope::Data representation that can keep slot location logic separate from slot type logic in SM?

syg commented 6 years ago

@arai-a I have pushed a refactoring of asserted scoeps to https://binast.github.io/ecmascript-binary-ast/

Search for "AssertedDeclaredKind", and check out the updated CheckAssertedScope Abstract Operation. I've also added an isSimpleParameterList to AssertedParameterScope to distinguish when its bindings are in TDZ.

Let me know if you think this'll work for you.

arai-a commented 6 years ago

Thanks! Filed https://bugzilla.mozilla.org/show_bug.cgi?id=1467052 to think about "var" vs "function".

arai-a commented 6 years ago

Fortunately, the var vs function can be represented without extra memory cost in SpiderMonkey, thanks to whoever invented tagged pointer. Given that let vs const is handled in the spec, what we should handle in the impl is only var vs function, and let vs class, so 1 bit is sufficient.

So, I think the change works :)