Closed 3052 closed 2 months ago
found another tool that works:
synchrony deobfuscate --rename obf.js
there's currently the --mangle
option for renaming but that does the opposite... turning _0x2d22bf
into a
do you want every variable/param to have an unique name?
if the variable already has a "normal" name such as hello
, we can probably just leave it as is. but if the name has been "minified" such as a
, then it should be made longer. ideally variables should be different even when names could be reused, such as two variables in different scopes. this would prevent confusion thinking two variables are the same because of the identifier.
When I was exploring this concept in my own deobfuscation PoC project, I was exploring to make the variable names unique + have them add sort of semantic information about their source/scope.
Eg. if it was an arg to a function, it might be arg_1
. Or potentially if the function is foo
, it might end up as foo_arg_1
It looks like most of the PoC code I was playing with was local/in a pretty messy/hacky state, but I did find a link in it to an online REPL I was playing around with some of it in. Not sure how outdated that code is, but it might be useful:
There were a number of different AST parsers I was playing around with, but I think that this babel code may have been the latest (not sure which one):
Within those files, I believe the functions getNameFromPath
, getPrefix
(and older commented out functions getTypePrefix
, getPrefix
Edit: Came across this in another issue here:
I published my decompiler that I used in the above example. I think it might be a good reference for adding this feature. https://github.com/e9x/krunker-decompiler
Originally posted by @e9x in https://github.com/j4k0xb/webcrack/issues/10#issuecomment-1546764062
And looking at it's libRenameVars
code seems to be taking a vaguely similar approach to how I was looking at doing things in my original PoC that I described above:
getVarPrefix
will set a prefix based on the type (eg. func
, arg
, Class
, imported
, var
)getName
generates a new variable name that does not conflict with existing names or reserved keywordsgenerateName
generates a new name for a variable considering its scope, type, and the context in which it is used (e.g., whether it's a class, a function variable, etc.).
It employs various AST manipulations to ensure the generated name is appropriate and does not conflict with existing names.A more generalised summary/overview (via ChatGPT):
Certainly, the code implements a sophisticated algorithm for renaming variables in a JavaScript program, adhering to several high-level rules and strategies:
Type-Specific Prefixing:
- The
getVarPrefix
function assigns specific prefixes to variable names based on their type (e.g., "func" for function names, "arg" for parameters). This approach helps in identifying the role of a variable just by its name.Avoiding Reserved Keywords:
- The script includes a comprehensive list of reserved JavaScript keywords. If a variable's name matches a reserved keyword, it is prefixed with an underscore to prevent syntax errors.
Unique Naming with Context Consideration:
- The
generateName
function ensures that each variable gets a unique name that doesn't conflict with other variables in its scope. It also considers the context in which a variable is used. For example, if a variable is part of a class, it may receive a name that reflects this context, usingpascalCase
orcamelCase
as appropriate.Handling Special Cases:
- The script contains logic to handle special cases, such as variables that are function expressions (
isFuncVar
) or class instances (isClass
). This affects the naming convention applied to these variables.Randomness with Mersenne Twister:
- A Mersenne Twister is used to generate random elements for variable names, ensuring that the names are not only unique within the scope of the program but also less predictable.
AST-Based Renaming:
- The script analyzes the Abstract Syntax Tree (AST) of the program to understand the structure and scope of variables. This analysis guides the renaming process, ensuring that the new names are consistent with the variable's usage and position in the code.
Scope Analysis with ESLint Scope:
- By leveraging
eslint-scope
, the script can accurately determine the scope of each variable. This is crucial in avoiding name collisions and ensuring that the renaming respects lexical scoping rules in JavaScript.Consideration for Exported and Assigned Variables:
- The script pays special attention to variables that are exported or assigned in specific ways (e.g., through
Object.defineProperty
). It ensures that these variables receive names that are appropriate for their roles.In summary, the script uses a combination of type-based naming conventions, context consideration, randomness, AST analysis, and scope analysis to systematically rename variables in a JavaScript program. This approach aims to enhance readability, avoid conflicts, and maintain the logical structure of the program.
And for an even cooler/more extreme version of improving variable naming; I just came across this blog post / project from @jehna that makes use of webcrack
+ ChatGPT for variable renaming:
Using LLMs to reverse JavaScript variable name minification This blog introduces a novel way to reverse minified Javascript using large language models (LLMs) like ChatGPT and llama2 while keeping the code semantically intact. The code is open source and available at Github project Humanify.
Un-minify Javascript code using ChatGPT
This tool uses large language modeles (like ChatGPT & llama2) and other tools to un-minify Javascript code. Note that LLMs don't perform any structural changes – they only provide hints to rename variables and functions. The heavy lifting is done by Babel on AST level to ensure code stays 1-1 equivalent.
I came across another tool today that seemed to have a start on implementing some 'smart rename' features:
Digging through the code lead me to this:
Rename minified identifiers with heuristic rules.
handleDestructuringRename
, handleFunctionParamsRename
, handlePropertyRename
, handleReactRename
, getElementName
generateName
, getUniqueName
There's also an issue there that seems to be exploring how to improve 'unmangling variable names' as well:
Which I wrote the following extra thoughts on:
I just finished up writing some thoughts/references for variable renaming on the
webcrack
repo, that could also be a useful idea for here. (see quotes below)When I was exploring PoC ideas for my own project previously, I was looking to generate a file similar to the 'module map' that this project is using; but instead of just for the names of modules, I wanted to be able to use it to provide a 'variable name map'. Though because the specific variables used in webpack/etc can change between builds, my thought was that first 'normalising' them to a 'known format' based on their context would make sense to do first.
That could then be letter enhanced/expanded by being able to pre-process these 'variable name mappings' for various open source projects in a way that could then be applied 'automagically' without the end user needing to first create them.
It could also be enhanced by similar techniques such as what the
humanify
project does, by using LLMs/similar to generate suggested variable name mappings based on the code.My personal ideal end goal for a feature like that would then allow me to use it within an IDE-like environment, where I can rename variables 'as I explore', knowing that the mappings/etc will be kept up to date.
Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/34#issuecomment-1807393509
Another link from my reference notes that I forgot to include earlier; my thoughts on how to rename otherwise unknown variables are based on similar concepts that are used in reverse engineering tools such as IDA:
In IDA’s disassembly, you may have often observed names that may look strange and cryptic on first sight:
sub_73906D75
,loc_40721B
,off_40A27C
and more. In IDA’s terminology, they’re called dummy names. They are used when a name is required by the assembly syntax but there is nothing suitable available
IDA Help: Names Representation
Dummy names are automatically generated by IDA. They are used to denote subroutines, program locations and data. Dummy names have various prefixes depending on the item type and value
And a few more I was looking at recently as well (that is sort of basically
smart-rename
:
- https://binary.ninja/2023/09/15/3.5-expanded-universe.html#automatic-variable-naming
Automatic Variable Naming One easy way to improve decompilation output is to come up with better default names for variables. There’s a lot of possible defaults you could choose and a number of different strategies are seen throughout different reverse engineering tools. Prior to 3.5, Binary Ninja left variables named based on their origin. Stack variables were var_OFFSET, register-based variables were regCOUNTER, and global data variables were (data). While this scheme isn’t changing, we’re being much more intelligent about situations where additional information is available.
For example, if a variable is passed to a function and a variable name is available, we can now make a much better guess for the variable name. This is most obvious in binaries with type libraries.
This isn’t the only style of default names. Binary Ninja also will name loop counters with simpler names like
i
, orj
,k
, etc (in the case of nested loops)- https://github.com/Vector35/binaryninja-api/issues/2558
Originally posted by @0xdevalias in https://github.com/pionxzh/wakaru/issues/34#issuecomment-1822263687
Tangentially related to this issue, and in line with how wakaru
implements 'smart-rename's (Ref) for certain things; I wonder if a similar concept could apply in webcrack
.
Based on how all of the functions containing JSX seem to be named a variation of Component
, I suspect there may already be some code doing this. (eg. Ref: 1, 2)
Regardless, the specific case I wanted to suggest here was when a React component sets the Component.displayName
, and leveraging that to 'smart-rename' the component identifier itself.
Unminifying this source file (Ref), in 63390.js
, there are some React components that set the .displayName
// 63390.js, lines 191-194
var _Component67 = forwardRef(function (e, t) {
return <div ref={t} className={_Z("relative flex h-full w-full overflow-hidden", e.className)}>{e.children}</div>;
});
_Component67.displayName = "CarouselContainer";
Contrasting this against wakaru
's output (Ref):
copilot now has a similar feature: https://code.visualstudio.com/updates/v1_87#_rename-suggestions worth looking into how they've done it
Originally posted by @j4k0xb in https://github.com/jehna/humanify/issues/8#issuecomment-1969984885
Release detailed here:
Couldn't see any overly relevant commits in that range, but did find the following searching the issue manually:
Which lead me to this label:
And these issues, which sound like there are 'rename providers' used by the feature:
More docs about rename providers here:
- https://code.visualstudio.com/docs/editor/editingevolved#_rename-symbol
- https://code.visualstudio.com/api/references/vscode-api#:~:text=registerRenameProvider(
- https://code.visualstudio.com/api/references/vscode-api#RenameProvider
RenameProvider The rename provider interface defines the contract between extensions and the rename-feature.
prepareRename
Optional function for resolving and validating a position before running rename. The result can be a range or a range and a placeholder text. The placeholder text should be the identifier of the symbol which is being renamed - when omitted the text in the returned range is used.
provideRenameEdits
Provide an edit that describes changes that have to be made to one or many resources to rename a symbol to a different name.- https://code.visualstudio.com/api/language-extensions/programmatic-language-features
- https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_prepareRename
Prepare Rename Request (:leftwards_arrow_with_hook:) The prepare rename request is sent from the client to the server to setup and test the validity of a rename operation at a given location.
- https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_rename
Rename Request The rename request is sent from the client to the server to ask the server to compute a workspace change so that the client can perform a workspace-wide rename of a symbol.
- https://vshaxe.github.io/vscode-extern/vscode/RenameProvider.html
- https://github.com/vshaxe/vscode-extern/blob/master/src/vscode/RenameProvider.hx
Based on the above, and the release notes explicitly mentioning copilot, I suspect the implementation will be in the Copilot extension itself (which isn't open-source):
- https://marketplace.visualstudio.com/items?itemName=GitHub.copilot Downloading that gives
GitHub.copilot-1.168.741.vsix
, which seems to just be a.zip
file:⇒ file GitHub.copilot-1.168.741.vsix GitHub.copilot-1.168.741.vsix: Zip archive data, at least v2.0 to extract, compression method=deflate
Though unzipping that and searching for
provideRename
didn't seem to turn up anything useful unfortunately.Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/8#issuecomment-1970345566
Continued context from above, it seems that this is implemented via a VSCode proposed API NewSymbolNamesProvider
:
It's less about "reverse engineering GitHub copilot" and more about "trying to figure out where the 'rename suggestions' change mentioned in the VSCode release notes was actually implemented; and what mechanism 'integrates' it into VSCode'".
The above is assumptions + an attempt to figure that out; but if you're able to point me to the actual issue/commit on the VSCode side (assuming it was implemented there), or confirm whether it's implemented on the closed source GitHub Copilot extension side of things (if it was implemented there), that would be really helpful.
If it was implemented on the GitHub Copilot extension side of things, then confirming whether the VSCode extension 'rename provider' is the right part of the VSCode extension API to look at to implement a similar feature would be awesome.
Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/8#issuecomment-1977697935
Thank you for taking interest in this API. The rename suggestions feature is powered by a proposed API defined here. Extensions provide the suggestions, while the vscode shows them in the rename widget.
Originally posted by @ulugbekna in https://github.com/jehna/humanify/issues/8#issuecomment-1978471876
is it possible to correct short variable names? for example with JADX, it has this option:
which will turn short variables such as
a
intoa1234
or something, for easier searching