pionxzh / wakaru

🔪📦 Javascript decompiler for modern frontend
https://wakaru.vercel.app/
MIT License
332 stars 20 forks source link

Scoping issue #32

Open pionxzh opened 1 year ago

pionxzh commented 1 year ago

The current identifier renaming is not 100% accurate. By inspecting the unpacking snapshot, you can tell that some variable was wrongly renamed to export or require during the unpacking process. Mostly because ast-types are giving us wrong scope information, and it's no longer well-maintained. We need to either patch it or find an alternative.

The best solution would be to fix it in the upstream. Let's track the progress at https://github.com/facebook/jscodeshift/issues/500

0xdevalias commented 1 year ago

I assume the issue here is related to ast-type's parsing/handling of Scope/etc?

For my own background knowledge/context, it seems that jscodeshift is basically a wrapper around recast, which relies on ast-types:

I wonder if the root issue lies in recast, or ast-types? Potentially related:


I haven't looked closely at the code here and how it's used, but would something like eslint-scope help?

Looking at the linked issue (https://github.com/facebook/jscodeshift/issues/263), it seems that a 'raw' block will lead to scope issues currently.

Example code that breaks:

const x = 42
{
  const x = 47
  console.log(x)
}
console.log(x)

Reviewing eslint-scope / escope sounds like they may also have similar bugs:


I was playing around with some PoC code with eslint-scope this morning, that you can see on the following REPL:

Running espree-eslint-scope_2.js with the breaking example code from https://github.com/facebook/jscodeshift/issues/263 (see above) gives me the following output:

$ node espree-eslint-scope_2.js

-= SCOPE INFO =-
Scope: type=global (block.type=Program) block.id?.name=undefined implicit=x,x,console,x,console,x
Variable: x [
  { type: 'Variable', kind: 'const' },
  { type: 'Variable', kind: 'const' }
] References: 4

-= ANALYSIS RESULT =-
{
  "type": "global",
  "implicitGlobals": [
    "x",
    "x",
    "console",
    "x",
    "console",
    "x"
  ],
  "identifiers": [
    {
      "name": "x",
      "types": [
        {
          "type": "Variable",
          "kind": "const"
        },
        {
          "type": "Variable",
          "kind": "const"
        }
      ]
    }
  ],
  "childScopes": []
}

I think this may mean that eslint-scope may also be incorrectly handling the scope of these variables; as I would expect there to be both the global scope, and the block scope.

It seems ChatGPT agrees with that:

Details **Me:** > Given the following javascript code, how many scopes should there be, and which should be defined in which scope? > > ``` > const x = 42 > { > const x = 47 > console.log(x) > } > console.log(x) > ``` > **ChatGPT:** > In the provided JavaScript code, there are two scopes involved: > > 1. **Global Scope**: The first `const x = 42;` is defined in the global scope. Variables declared in the global scope can be accessed and modified throughout the entire script. > > 2. **Block Scope**: The `{ const x = 47; console.log(x); }` part defines a block scope. The `const x = 47;` within this block is a separate variable that only exists within this block scope. It's important to note that `const` and `let` in JavaScript are block-scoped, meaning they are only accessible within the block they are defined in. > > Here's a breakdown of what's defined in each scope: > > - **Global Scope**: > - `const x = 42;` > > - **Block Scope**: > - `const x = 47;` (This is a different `x` than the one in the global scope.) > > When the `console.log(x);` inside the block is executed, it outputs `47`, which is the value of `x` in the block scope. When the `console.log(x);` outside the block is executed, it outputs `42`, which is the value of `x` in the global scope. > > The two `x` variables, although they share the same name, are distinct and exist in separate scopes.

I've opened a bug for this on eslint-scope, but unless that gets fixed, it doesn't seem like it will be a good solution for the needs here:

Edit: Seems I was wrong about that, see my below comment for updates:


These tools may also be useful for checking how escope handles things:

0xdevalias commented 1 year ago

I just did another test in the REPL, this time just using @babel/parser / @babel/traverse directly (not wrapped with jscodeshift / recast, so not using ast-types):

PoC Code ```js const parser = require('@babel/parser'); const traverse = require('@babel/traverse').default; const code = ` const x = 42; { const x = 47; console.log(x); } console.log(x); `; const ast = parser.parse(code, { sourceType: 'module', // plugins: [ // // Add any necessary plugins // ], }); const scopes = new Set(); function collectScopes(path) { if (path.scope && !scopes.has(path.scope)) { scopes.add(path.scope); path.scope.parent && collectScopes(path.findParent((p) => p.scope)); } } traverse(ast, { enter(path) { collectScopes(path); }, }); console.log("\n-= Scopes and Bindings =-\n"); scopes.forEach((scope) => { console.group( `Scope (uid=${scope.uid}, path.type=${scope.path.type})` ); console.log( 'Bindings:', JSON.stringify(Object.keys(scope.bindings), null, 2) ); // console.log(scope); console.groupEnd(); console.log(); }); ```

This seems to have correctly output the 2 scopes as expected:

$ node babel-scopes.js 

-= Scopes and Bindings =-

Scope (uid=0, path.type=Program)
  Bindings: [
    "x"
  ]

Scope (uid=1, path.type=BlockStatement)
  Bindings: [
    "x"
  ]
0xdevalias commented 1 year ago

I think this may mean that eslint-scope may also be incorrectly handling the scope of these variables; as I would expect there to be both the global scope, and the block scope.

I've opened a bug for this on eslint-scope, but unless that gets fixed, it doesn't seem like it will be a good solution for the needs here:

It seems I was just using eslint-scope wrong, and not parsing the ecmaScript: 6 option to it (Ref)

There are some useful docs links and discoveries in my comment on the above issue:

And I also updated my REPL to fix the code and make it work properly now:

With the very basic minimal example being:

const espree = require('espree');
const eslintScope = require('eslint-scope');

const exampleJSCode = `
const x = 42
{
  const x = 47
  console.log(x)
}
console.log(x)
`

const commonParserOptions = {
  ecmaVersion: 2020,
  sourceType: 'module', // script, module, commonjs
}

// Parse the JavaScript code into an AST with range information
const ast = espree.parse(jsCode, {
  ...commonParserOptions,
  range: true  // Include range information
});

// Analyze the scopes in the AST
// See the .analyze options in the source for more details
//   https://github.com/eslint/eslint-scope/blob/957748e7fb741dd23f521af0c124ce6da0848997/lib/index.js#L111-L131
// See the following for more details on the ScopeManager interface:
//  https://eslint.org/docs/latest/extend/scope-manager-interface
//  https://github.com/eslint/eslint-scope/blob/main/lib/scope-manager.js
const scopeManager = eslintScope.analyze(
  ast,
  {
    ...commonParserOptions,
    nodejsScope: false,
  }
);

console.log('ScopeManager.scopes=', scopeManager.scopes)

See the ScopeManger interface docs (Ref, potentially not up to date?) for more info on what objects/functions/etc eslint-scope provides.

pionxzh commented 1 year ago

Babel and eslint are handling it correctly; the issue is from ast-types. The reason why I didn't use them is to maintain only one ast tool been used. We can either fix ast-types ourselves, and potentially PR back to the upstream. Using Babel or other tools for scoping is doable, but it will be hard to maintain, and there will be some performance punishment. I will check some references you provided this weekend to see what can we do.

0xdevalias commented 1 year ago

Babel and eslint are handling it correctly; the issue is from ast-types

@pionxzh Yup. I just wanted to test/ensure that the other suggestions I was making actually handled it properly.


The reason why I didn't use them is to maintain only one ast tool been used.

@pionxzh Yeah, that definitely makes sense. I wasn't suggesting them so much in a 'use multiple ast tools' way. More in a 'if we need to replace the current choice, what are some alternatives that handle it properly'


We can either fix ast-types ourselves, and potentially PR back to the upstream.

@pionxzh nods Part of the reason I was looking at some of the alternatives is that the upstream jscodeshift issue you linked seems to suggest that recast and ast-types are pretty unmaintained; and even that issue hasn't had any meaningful updates in more than a year, which makes me think jscodeshift is also equally unmaintained:

So my thinking was more in the space of "if the current underlying tools are unmaintained and buggy, what are the 'best modern alternative' choices we can use; and particularly ones that are quite heavily used by others, and so are likely to stay maintained/updated.


Using Babel or other tools for scoping is doable, but it will be hard to maintain, and there will be some performance punishment.

@pionxzh When you say hard to maintain/performance punishment; do you just mean if we were using multiple AST tools at the same time? Or do you mean by switching to one of those at all?


I will check some references you provided this weekend to see what can we do.

@pionxzh Sounds good :)

pionxzh commented 1 year ago

I mean it would be hard to maintain with multiple AST tools.

0xdevalias commented 1 year ago

Yup, makes sense. I'd definitely also recommend sticking to 1 for that reason.

0xdevalias commented 1 year ago

I'm curious, what benefits does jscodeshift (with the babylon / babel parser) / recast / ast-types give us compared to just using @babel/parser / @babel/traverse directly?

I asked ChatGPT, and it seemed to suggest that there is a 'higher level API' from jscodeshift; and that recast focuses on maintaining the original formatting/spacing of source code being modified (which since I believe we pretty print with prettier anyway, is probably irrelevant to the needs of this project)

I'm curious whether that 'higher level API' from jscodeshift actually simplifies things much compared to using babel's libraries directly?

ChatGPT Response Question: > A javascript AST project I am looking at is using jscodeshift with the babel / babylon parser. jscodeshift uses recast, which uses ast-types. > > What benefits would jscodeshift / recast / ast-types give us compared to just using @babel/parser / @babel/traverse directly Response: > Using `jscodeshift`, `recast`, and `ast-types` in conjunction with the Babel/Babylon parser offers several benefits compared to using `@babel/parser` and `@babel/traverse` directly. Here's an overview: > > ### jscodeshift > > 1. **Higher-Level API**: jscodeshift provides a more fluent and high-level API for transforming code. It's designed to make writing codemods (code modifications) simpler and more intuitive. > > 2. **Integration with Recast**: jscodeshift uses `recast` internally, which helps in retaining the original formatting and style of the code as much as possible. This is particularly useful when you want to apply transformations without altering the code's original format. > > 3. **Community and Ecosystem**: jscodeshift has a good community and ecosystem around it, with many predefined codemods and transformations that you can leverage. > > ### Recast > > 1. **Code Formatting Preservation**: Recast's primary feature is to parse your code and then print it back out while preserving the original formatting as much as possible. This is crucial when working on large codebases where consistency in formatting is vital. > > 2. **Compatibility with Various Parsers**: Recast is designed to be parser-agnostic, so it can work with different parsers like Esprima, Babel, and others. This flexibility allows you to choose the parser that best suits your needs. > > 3. **AST Manipulation**: Although `@babel/traverse` is powerful for AST traversal, recast adds an extra layer of utility in terms of manipulating the AST and reprinting it in a way that respects the original code's layout. > > ### ast-types > > 1. **AST Node Type Definitions**: `ast-types` is used by recast for defining and validating the structure of AST nodes. This helps in ensuring that the transformations you apply are structurally sound and conform to the expected AST node types. > > 2. **Fluent API for AST Construction**: `ast-types` provides a fluent API for constructing AST nodes, making it easier to generate new parts of the AST in your transformations. > > ### Comparison with @babel/parser and @babel/traverse > > - **Direct AST Manipulation**: `@babel/parser` and `@babel/traverse` offer more direct and low-level manipulation of the AST. While this gives you more control, it also requires a deeper understanding of ASTs and might involve more boilerplate code for common transformations. > > - **Formatting Considerations**: When using Babel tools directly, you often have to handle formatting separately (using tools like Prettier), as they don't inherently preserve the original code's formatting. > > - **Use Case Specificity**: If your primary goal is to write transformation scripts or codemods with a focus on maintaining code style and formatting, jscodeshift with recast and ast-types is a more suitable choice. For more general-purpose AST analysis and manipulation, `@babel/parser` and `@babel/traverse` might be adequate. > > In conclusion, the choice between these tools depends on your specific use cases, the importance of preserving code formatting, and the level of abstraction you prefer for AST manipulation. For more straightforward AST parsing and traversal, Babel's tools might suffice. For more complex transformations, particularly where code style preservation is crucial, jscodeshift combined with recast and ast-types offers a more specialized solution.

Edit: For my own reference, here are the code locations currently referencing jscodeshift, recast and/or ast-types:

I also note that there is a @wakaru/ast-utils package that could also potentially be used to centralise some 'leaky abstractions' from these libs if we were to refactor:

And it might also potentially make sense to refactor some of these utils into that ast-utils package as well maybe?

Edit 2: Raised a new issue for the 'refactor into @wakaru/ast-utils' potential mentioned above:

pionxzh commented 1 year ago

jscodeshift provides a query-like API that is easier to find and manipulate specific AST nodes than babel/traverse's traversal API. Adopting another tool (eg. babel/traverse) basically means rewriting everything, which is the thing that I do not want to spend time on. Currently the scoping issue is only happening on some edge cases. It's still acceptable for me. That's why I prefer to "patch" it instead of using a totally different tool.

0xdevalias commented 1 year ago

jscodeshift provides a query-like API that is easier to find and manipulate specific AST nodes than babel/traverse's traversal API.

nods that makes sense, and I suspected it was likely going to be something like that, but I wanted to confirm my assumptions (given that the current jscodeshift -> @babel/parser -> recast -> ast-types adds a lot of extra libraries/points of potential bugs/etc, particularly given jscodeshift/recast/ast-types are basically unmaintained currently)

I was basically thinking about what the 'main advantages' are that are being gained from each of the AST related libs; as that then makes it easier to think about whether there is a 'better' solution that still meets those needs/constraints.

pionxzh commented 1 year ago

I think jscodeshift's README provides a quite detailed description of each part of the components. 🤔

0xdevalias commented 1 year ago

I think jscodeshift's README provides a quite detailed description of each part of the components. 🤔

@pionxzh Yeah, in terms of jscodeshift. But I wanted to know what your personal reasoning for wanting to stick with it specifically was, and whether you cared about some of the 'strengths' that came along with the underlying libs (such as recast's focus on not modifying unrelated whitespace/etc; which seems irrelevant for wakaru since it prettifiers the end code anyway); which you answered for me above (simpler API + don't want to have to recode existing stuff for a new API)