oxc-project / oxc

⚓ A collection of JavaScript tools written in Rust.
https://oxc.rs
MIT License
11.14k stars 405 forks source link

Untracked `arguments` in function scopes. #4232

Open rzvxa opened 2 months ago

rzvxa commented 2 months ago

Related to this: https://github.com/oxc-project/oxc/issues/3374#issuecomment-2223483123

We do not keep track the arguments object in functions, It causes false positives in linters as of now and it can impose more serious limitations or errors in other crates(transformers and minifier come to mind as we might shadow the function arguments by mistake).

rzvxa commented 2 months ago

Since you are heavily invested in the scopes right now, @Dunqing What do you think?

overlookmotel commented 2 months ago

Funny, I was wondering how we handle arguments just this morning. Sounds like we don't.

I can see 3 options:

  1. Add arguments binding to all function scopes eagerly.
  2. Add arguments binding to function scopes lazily, only once a reference to it is found in function body (or params).
  3. Special case arguments so IdentifierReference for it would have no SymbolId.

Personally, I would tend towards (2) - simple, and avoids perf hit of (1).

NB: arguments is almost like an invisible function param, but there is one oddity in sloppy mode:

function f1(x) {
  // SyntaxError: Identifier 'x' has already been declared
  let x;
}

function f2(arguments) {
  // SyntaxError: Identifier 'arguments' has already been declared
  let arguments;
}

function f3() {
  // No problem
  let arguments;
}

Also, you can have 2 bindings for arguments in a function:

function f(x, y = arguments) {
    let arguments = 456;
    console.log(y[0]); // Logs 123
    console.log(arguments); // Logs 456
}
f(123);

So we'd need an extra scope for function body.

rzvxa commented 2 months ago

I think the option 3 isn't a really good approach here. I'm torn between 1 and 2.

The latter seems like a good thing for performance but I'm afraid it would cause issues later on because javascript is whimsical at times:

function f() {
    eval("console.log(arguments)");
}

f("a", "b", "c")

With the second approach, we still have the same issue of shadowing arguments and generating the wrong code!

overlookmotel commented 2 months ago

He had to go and bring eval() into it... 😄

But even then, I can't actually see how that can go wrong. eval() will (I assume) prevent mangling names of all bindings which it can "see". And we can just ban creating new bindings called arguments in the transformer (that's illegal in strict mode anyway).

Boshen commented 2 months ago

https://github.com/eslint/eslint-scope/blob/8082caa1a0f9aae0894a0a01fef9efa7a5e509f6/lib/scope.js#L671

rzvxa commented 2 months ago

He had to go and bring eval() into it... 😄

😆

But even then, I can't actually see how that can go wrong. eval() will (I assume) prevent mangling names of all bindings which it can "see". And we can just ban creating new bindings called arguments in the transformer (that's illegal in strict mode anyway).

You are right that we shouldn't ever mangle arguments if it isn't user-defined. I'm just saying since arguments are a context-sensitive identifier and javascript allows for weird ways to access variables through non-conventional ways it might cause issues further down the road.

Maybe no sane person uses new Function or eval in the production code, But javascript frameworks might generate such code and expect us to bundle it correctly for them.

I might be overthinking this.

Dunqing commented 2 months ago

We should add an argument symbol to the function scope. Both TypeScript and typescript-eslint do the same thing

overlookmotel commented 2 months ago

What do you think of this example?

// Sloppy mode
function f(x, y = arguments) {
    let arguments = 456;
    console.log(y[0]); // Logs 123
    console.log(arguments); // Logs 456
}
f(123);

This demonstrates there can be 2 distinct bindings for arguments in a function. arguments in y = arguments references a different binding from let arguments = 456.

rzvxa commented 2 months ago

I was reading the eslint-scope tests today when saw an oddity I couldn't figure.

It contains arguments in the global scope. Your comment made me go and look it up, Turns out they always fake arguments in all non-function scopes. Here's my educated guess about how it is done there:

With these steps, your code would work similarly to this:

let arguments = [123];
function f(x, y = arguments) {
    let arguments = 456;
    console.log(y[0]); // Logs 123
    console.log(arguments); // Logs 456
}
f(123);
overlookmotel commented 2 months ago

Treating arguments as external to the function just seems weird. And it can't handle the case where there's another arguments binding in scope above e.g.:

// Sloppy mode
let arguments;
function f(x, y = arguments) {
    let arguments = 456;
}

Here we have 3 bindings for arguments, and only 2 scopes to put them in. I know this is an edge case, and will be very rare in practice, but Oxc aims for absolute correctness.

Only solutions I can see are:

  1. Always add an extra scope for function body.
  2. Only add an extra scope for function body if there's a reference to arguments in function params.
  3. Treat implicit arguments as a special case and don't create a binding for it.

(1) is simpler and more conceptually correct, but imposes quite a large penalty. Every function generates 2 scopes, and functions are a common construct, so it will slow down symbol resolution significantly. (3) has been rejected. So I would lean towards (2).

rzvxa commented 2 months ago

I personally would prefer to find a way to avoid unnecessary scopes, Can't we declare arguments as a hoisted variable on the function scope? with that, we should be able to redeclare it in the function body.

But that's just my preference and I have nothing against your approach here, It just feels wrong to add scopes left and right as a means to simplify stuff. As you said correctness is more important. I'm not a huge fan of premature optimization.

I just have a quick question. We are wrapping the whole function in the scope right now. Is this going to change? Is the scope going to be entered after the parameters?

overlookmotel commented 2 months ago

To clarify, what I'm saying about option (2) above is:

i.e. Handle the weird uncommon edge case as a de-opt. No overhead from extra scope in common case, but maintain complete correctness.

rzvxa commented 2 months ago

To clarify, what I'm saying about option (2) above is:

* By default, only create 1 scope for the function (as now).

* Add binding for `arguments` to that scope.

* _Only if_ `arguments` is referenced in the function's params, create a further scope for function body. This will be very rare.

i.e. Handle the weird uncommon edge case as a de-opt. No overhead from extra scope in common case, but maintain complete correctness.

Actually, it's not as bad as I initially thought. I'm just afraid of something like this.

When you are accessing the Ancestors of a scope you have to check if that scope is a legit or a fake one to prevent issues in places when we want to get the "actual" parent with something like scope.ancestors(xxx).nth(1) or scope.get_parent_id(xxx).

It also means that there are now some really obscure edge cases with arguments that won't get tested, It is not just about parent scope, With this approach, you'd have some identifiers that are pointing to the parent of a function's body but aren't the actual scope of the enclosing node. I feel it would cause a lot of confusion and heisenbugs for both us(especially contributions that don't get reviews by people who are aware of this) and our crate users.

What are your thoughts on this? Even if we do all of these checks doesn't it mean we won't get much performance out of this in the long run(as we start to use scope more and more)?


Edit:

I've checked the spec and even there, They only create a new lexical environment if strict is false to prevent eval from creating a new binding. Otherwise, they would always use the lexical environment of the function's callee. I know these aren't apple to apple comparisons as environments are a runtime concept and aren't 1:1 with scopes. But it might help.

image

image

But maybe I'm missing something here.

https://tc39.es/ecma262/multipage/ordinary-and-exotic-objects-behaviours.html#sec-functiondeclarationinstantiation https://tc39.es/ecma262/multipage/ordinary-and-exotic-objects-behaviours.html#sec-arguments-exotic-objects

Dunqing commented 2 months ago

I found out how TypeScript handles arguments. TypeScript adds an arguments symbol but does not add it in scopes. https://github.com/microsoft/TypeScript/blob/d8086f14b6b97c0df34a0cc2f56d4b5926a0c299/src/compiler/utilities.ts#L11361-L11380

Boshen commented 2 months ago

I found out how TypeScript handles arguments. TypeScript adds an arguments symbol but does not add it in scopes. https://github.com/microsoft/TypeScript/blob/d8086f14b6b97c0df34a0cc2f56d4b5926a0c299/src/compiler/utilities.ts#L11361-L11380

Nice find. I also see tsc defines

    globals: SymbolTable;
    argumentsSymbol: Symbol;
    requireSymbol: Symbol;
overlookmotel commented 2 months ago

TypeScript adds an arguments symbol but does not add it in scopes.

TS may do it, but it seems like a bad idea to me. Rather confusing, no?

Boshen commented 1 week ago

For the record, I added this to the mangler

https://github.com/oxc-project/oxc/blob/0a2f68756f38f8ef003425e856061502b0c97a0a/crates/oxc_mangler/src/lib.rs#L229-L231

overlookmotel commented 1 week ago

arguments can be renamed if it's a local const, let, or function declaration declared in function scope.

// Sloppy mode
function outer() {
    let arguments = 123;
    console.log(arguments);
}

function outer2() {
    function arguments() {}
    console.log(arguments);
}

But not if it's a var, or hoisted function declaration:

// Sloppy mode
function outer() {
    console.log(arguments); // Logs [1, 2, 3]
    var arguments = 456;
    console.log(arguments); // Logs 456
}
outer(1, 2, 3);

function outer2() {
    console.log(arguments); // Logs [1, 2, 3]
    {
        function arguments() {}
    }
    console.log(typeof arguments); // Logs 'function'
}
outer2(1, 2, 3);
overlookmotel commented 1 week ago

Personally I'm in favour of adding scopes for function bodies anyway for other reasons (#5017), so that'd resolve some of the questions above.