j4k0xb / webcrack

Deobfuscate obfuscator.io, unminify and unpack bundled javascript
https://webcrack.netlify.app
MIT License
873 stars 100 forks source link

rename short identifiers #21

Closed 3052 closed 2 months ago

3052 commented 11 months ago

is it possible to correct short variable names? for example with JADX, it has this option:

 --deobf                             - activate deobfuscation

which will turn short variables such as a into a1234 or something, for easier searching

3052 commented 11 months ago

found another tool that works:

synchrony deobfuscate --rename obf.js

https://github.com/relative/synchrony

j4k0xb commented 11 months ago

there's currently the --mangle option for renaming but that does the opposite... turning _0x2d22bf into a do you want every variable/param to have an unique name?

3052 commented 11 months ago

if the variable already has a "normal" name such as hello, we can probably just leave it as is. but if the name has been "minified" such as a, then it should be made longer. ideally variables should be different even when names could be reused, such as two variables in different scopes. this would prevent confusion thinking two variables are the same because of the identifier.

0xdevalias commented 10 months ago

When I was exploring this concept in my own deobfuscation PoC project, I was exploring to make the variable names unique + have them add sort of semantic information about their source/scope.

Eg. if it was an arg to a function, it might be arg_1. Or potentially if the function is foo, it might end up as foo_arg_1

It looks like most of the PoC code I was playing with was local/in a pretty messy/hacky state, but I did find a link in it to an online REPL I was playing around with some of it in. Not sure how outdated that code is, but it might be useful:

There were a number of different AST parsers I was playing around with, but I think that this babel code may have been the latest (not sure which one):

Within those files, I believe the functions getNameFromPath, getPrefix (and older commented out functions getTypePrefix, getPrefix


Edit: Came across this in another issue here:

I published my decompiler that I used in the above example. I think it might be a good reference for adding this feature. https://github.com/e9x/krunker-decompiler

Originally posted by @e9x in https://github.com/j4k0xb/webcrack/issues/10#issuecomment-1546764062

And looking at it's libRenameVars code seems to be taking a vaguely similar approach to how I was looking at doing things in my original PoC that I described above:

A more generalised summary/overview (via ChatGPT):

Certainly, the code implements a sophisticated algorithm for renaming variables in a JavaScript program, adhering to several high-level rules and strategies:

  1. Type-Specific Prefixing:

    • The getVarPrefix function assigns specific prefixes to variable names based on their type (e.g., "func" for function names, "arg" for parameters). This approach helps in identifying the role of a variable just by its name.
  2. Avoiding Reserved Keywords:

    • The script includes a comprehensive list of reserved JavaScript keywords. If a variable's name matches a reserved keyword, it is prefixed with an underscore to prevent syntax errors.
  3. Unique Naming with Context Consideration:

    • The generateName function ensures that each variable gets a unique name that doesn't conflict with other variables in its scope. It also considers the context in which a variable is used. For example, if a variable is part of a class, it may receive a name that reflects this context, using pascalCase or camelCase as appropriate.
  4. Handling Special Cases:

    • The script contains logic to handle special cases, such as variables that are function expressions (isFuncVar) or class instances (isClass). This affects the naming convention applied to these variables.
  5. Randomness with Mersenne Twister:

    • A Mersenne Twister is used to generate random elements for variable names, ensuring that the names are not only unique within the scope of the program but also less predictable.
  6. AST-Based Renaming:

    • The script analyzes the Abstract Syntax Tree (AST) of the program to understand the structure and scope of variables. This analysis guides the renaming process, ensuring that the new names are consistent with the variable's usage and position in the code.
  7. Scope Analysis with ESLint Scope:

    • By leveraging eslint-scope, the script can accurately determine the scope of each variable. This is crucial in avoiding name collisions and ensuring that the renaming respects lexical scoping rules in JavaScript.
  8. Consideration for Exported and Assigned Variables:

    • The script pays special attention to variables that are exported or assigned in specific ways (e.g., through Object.defineProperty). It ensures that these variables receive names that are appropriate for their roles.

In summary, the script uses a combination of type-based naming conventions, context consideration, randomness, AST analysis, and scope analysis to systematically rename variables in a JavaScript program. This approach aims to enhance readability, avoid conflicts, and maintain the logical structure of the program.

0xdevalias commented 10 months ago

And for an even cooler/more extreme version of improving variable naming; I just came across this blog post / project from @jehna that makes use of webcrack + ChatGPT for variable renaming:

0xdevalias commented 10 months ago

I came across another tool today that seemed to have a start on implementing some 'smart rename' features:

Digging through the code lead me to this:

There's also an issue there that seems to be exploring how to improve 'unmangling variable names' as well:

Which I wrote the following extra thoughts on:

I just finished up writing some thoughts/references for variable renaming on the webcrack repo, that could also be a useful idea for here. (see quotes below)

When I was exploring PoC ideas for my own project previously, I was looking to generate a file similar to the 'module map' that this project is using; but instead of just for the names of modules, I wanted to be able to use it to provide a 'variable name map'. Though because the specific variables used in webpack/etc can change between builds, my thought was that first 'normalising' them to a 'known format' based on their context would make sense to do first.

That could then be letter enhanced/expanded by being able to pre-process these 'variable name mappings' for various open source projects in a way that could then be applied 'automagically' without the end user needing to first create them.

It could also be enhanced by similar techniques such as what the humanify project does, by using LLMs/similar to generate suggested variable name mappings based on the code.

My personal ideal end goal for a feature like that would then allow me to use it within an IDE-like environment, where I can rename variables 'as I explore', knowing that the mappings/etc will be kept up to date.

Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/34#issuecomment-1807393509

0xdevalias commented 10 months ago

Another link from my reference notes that I forgot to include earlier; my thoughts on how to rename otherwise unknown variables are based on similar concepts that are used in reverse engineering tools such as IDA:


And a few more I was looking at recently as well (that is sort of basically smart-rename:

  • https://binary.ninja/2023/09/15/3.5-expanded-universe.html#automatic-variable-naming
    • Automatic Variable Naming One easy way to improve decompilation output is to come up with better default names for variables. There’s a lot of possible defaults you could choose and a number of different strategies are seen throughout different reverse engineering tools. Prior to 3.5, Binary Ninja left variables named based on their origin. Stack variables were var_OFFSET, register-based variables were regCOUNTER, and global data variables were (data). While this scheme isn’t changing, we’re being much more intelligent about situations where additional information is available.

      For example, if a variable is passed to a function and a variable name is available, we can now make a much better guess for the variable name. This is most obvious in binaries with type libraries.

    • This isn’t the only style of default names. Binary Ninja also will name loop counters with simpler names like i, or j, k, etc (in the case of nested loops)

  • https://github.com/Vector35/binaryninja-api/issues/2558

Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/34#issuecomment-1822263687

0xdevalias commented 9 months ago

Tangentially related to this issue, and in line with how wakaru implements 'smart-rename's (Ref) for certain things; I wonder if a similar concept could apply in webcrack.

Based on how all of the functions containing JSX seem to be named a variation of Component, I suspect there may already be some code doing this. (eg. Ref: 1, 2)


Regardless, the specific case I wanted to suggest here was when a React component sets the Component.displayName, and leveraging that to 'smart-rename' the component identifier itself.

Unminifying this source file (Ref), in 63390.js, there are some React components that set the .displayName

// 63390.js, lines 191-194
var _Component67 = forwardRef(function (e, t) {
  return <div ref={t} className={_Z("relative flex h-full w-full overflow-hidden", e.className)}>{e.children}</div>;
});
_Component67.displayName = "CarouselContainer";

Contrasting this against wakaru's output (Ref):

Details **Source (unpacked)** ```js // module-63390.js, lines 279-283 var er = (0, d.forwardRef)(function (e, t) { return (0, o.jsx)("div", { ref: t, className: (0, l.Z)("relative flex h-full w-full overflow-hidden", e.className), children: e.children }); }); er.displayName = "CarouselContainer"; ``` **Transformed (unminified)** ```js // module-63390.js, lines 309-320 const CarouselContainer = forwardRef((props, ref) => (
{props.children}
)); CarouselContainer.displayName = "CarouselContainer"; ```
0xdevalias commented 7 months ago

copilot now has a similar feature: https://code.visualstudio.com/updates/v1_87#_rename-suggestions worth looking into how they've done it

Originally posted by @j4k0xb in https://github.com/jehna/humanify/issues/8#issuecomment-1969984885


Release detailed here:

Couldn't see any overly relevant commits in that range, but did find the following searching the issue manually:

Which lead me to this label:

And these issues, which sound like there are 'rename providers' used by the feature:

More docs about rename providers here:

Based on the above, and the release notes explicitly mentioning copilot, I suspect the implementation will be in the Copilot extension itself (which isn't open-source):

⇒ file GitHub.copilot-1.168.741.vsix

GitHub.copilot-1.168.741.vsix: Zip archive data, at least v2.0 to extract, compression method=deflate

Though unzipping that and searching for provideRename didn't seem to turn up anything useful unfortunately.

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/8#issuecomment-1970345566

0xdevalias commented 6 months ago

Continued context from above, it seems that this is implemented via a VSCode proposed API NewSymbolNamesProvider:


It's less about "reverse engineering GitHub copilot" and more about "trying to figure out where the 'rename suggestions' change mentioned in the VSCode release notes was actually implemented; and what mechanism 'integrates' it into VSCode'".

The above is assumptions + an attempt to figure that out; but if you're able to point me to the actual issue/commit on the VSCode side (assuming it was implemented there), or confirm whether it's implemented on the closed source GitHub Copilot extension side of things (if it was implemented there), that would be really helpful.

If it was implemented on the GitHub Copilot extension side of things, then confirming whether the VSCode extension 'rename provider' is the right part of the VSCode extension API to look at to implement a similar feature would be awesome.

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/8#issuecomment-1977697935


Thank you for taking interest in this API. The rename suggestions feature is powered by a proposed API defined here. Extensions provide the suggestions, while the vscode shows them in the rename widget.

Originally posted by @ulugbekna in https://github.com/jehna/humanify/issues/8#issuecomment-1978471876