jehna / humanify

Deobfuscate Javascript code using ChatGPT
MIT License
1.37k stars 54 forks source link

Unsafe identifier renames #8

Open j4k0xb opened 9 months ago

j4k0xb commented 9 months ago

Minifiers only rename variable/function/parameter bindings and not any other identifier like properties because it's almost impossible to guarantee the program still works then.

Example input file:

class A {
  get() {
    return document.querySelector("#foo");
  }
}

humanify output:

class MyClass {
  getElementById() {
    return dom.findElement("#foo");
  }
}

I think thats the only change necessary but it needs some testing: https://github.com/jehna/humanify/blob/4dfe05e20f530a8919e6d1e9f89564be5bb874e0/src/openai/rename-variables-and-functions.ts#L16

- Identifier: (path) => { 
+ BindingIdentifier: (path) => { 

Or how about renaming all variables to something unique so it doesn't mix up minified ones and then supplying a list of binding names it is allowed to change in the prompt?

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed. Maybe use babel's scope-aware rename that also handles these conflicts (have to find a way of getting rid of the _ prefix): https://astexplorer.net/#/gist/7283141e13dab314521744603a95e9b7/05b11370db8d5ef257550b2916b87d6e72e4eb1d

0xdevalias commented 9 months ago

Or how about renaming all variables to something unique so it doesn't mix up minified ones and then supplying a list of binding names it is allowed to change in the prompt?

I've laid out some of my thoughts on this concept in general, in the following issues/comments:

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.

Off the top of my head, if it's not easily solved via tweaking the prompt, you could potentially have it do a 'refinement/feedback round' where you provide feedback to the LLM + get it to correct the issues (which could be repeated up to X times) eg.

The following variables were already defined in the scope, so we can't use the names you chose. Please choose a new name for them so that they don't clash:

  • X
  • Y
  • Z
j4k0xb commented 7 months ago

copilot now has a similar feature: https://code.visualstudio.com/updates/v1_87#_rename-suggestions worth looking into how they've done it

0xdevalias commented 7 months ago

Release detailed here:

Couldn't see any overly relevant commits in that range, but did find the following searching the issue manually:

Which lead me to this label:

And these issues, which sound like there are 'rename providers' used by the feature:

More docs about rename providers here:

Based on the above, and the release notes explicitly mentioning copilot, I suspect the implementation will be in the Copilot extension itself (which isn't open-source):

⇒ file GitHub.copilot-1.168.741.vsix

GitHub.copilot-1.168.741.vsix: Zip archive data, at least v2.0 to extract, compression method=deflate

Though unzipping that and searching for provideRename didn't seem to turn up anything useful unfortunately; so not sure if that means it's not implemented in the copilot extension, it doesn't use VSCode's 'rename provider' API, or I just wasn't looking right.

isidorn commented 6 months ago

Hi VS Code PM here,

A team member pointed me to this issue because it links to one of our VS Code issues.

I wanted to share that reverse engineering GitHub copilot is violating the general terms of the use agreement https://github.com/customer-terms/general-terms

Feel free to reach out to me if I can help with the general terms clarification inikolic@microsoft.com

Thank you Isidor

0xdevalias commented 6 months ago

@isidorn It's less about "reverse engineering GitHub copilot" and more about "trying to figure out where the 'rename suggestions' change mentioned in the VSCode release notes was actually implemented; and what mechanism 'integrates' it into VSCode'".

The above is assumptions + an attempt to figure that out; but if you're able to point me to the actual issue/commit on the VSCode side (assuming it was implemented there), or confirm whether it's implemented on the closed source GitHub Copilot extension side of things (if it was implemented there), that would be really helpful.

If it was implemented on the GitHub Copilot extension side of things, then confirming whether the VSCode extension 'rename provider' is the right part of the VSCode extension API to look at to implement a similar feature would be awesome.

ulugbekna commented 6 months ago

Thank you for taking interest in this API. The rename suggestions feature is powered by a proposed API defined here. Extensions provide the suggestions, while the vscode shows them in the rename widget.

0xdevalias commented 6 months ago

@ulugbekna Awesome; thanks for pointing that out! Shall have a closer read :)

Edit: See also:

sagarika-fynd commented 6 months ago

I renamed all variables to unique names, and then ran the unminify code, most of the stuff was unminified, but i still had some random code, which wasnts, so i added a check to only rename variables which are in the pattern of my naming convention, and instructed the same to GPT, had to run this a few times, but it made sure sancity of previous naming convention was maintained

0xdevalias commented 1 month ago

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.

See also:


Semi-related:

0xdevalias commented 1 day ago

Also it can happen that a variable gets renamed but another one with the same name already exists in the scope, or the code gets split up in such a way that not all references are renamed.

Off the top of my head, if it's not easily solved via tweaking the prompt, you could potentially have it do a 'refinement/feedback round' where you provide feedback to the LLM + get it to correct the issues (which could be repeated up to X times) eg.

The following variables were already defined in the scope, so we can't use the names you chose. Please choose a new name for them so that they don't clash:

  • X
  • Y
  • Z

Just made a similar suggestion for this pattern while debugging another issue, referencing it here for continuity:

The proper fix would be to use toIdentifier from @babel/types

@j4k0xb @jehna From a 2sec search I couldn't find rendered docs, but here's the relevant source for each:

toIdentifier definitely seems like a more robust approach than the current 'prefix with _' approach for sure.

Though I wonder if a 'proper fix' should also involve tweaking how we prompt for/filter the suggestions coming back from the LLM itself as well. Like instead of just forcing an invalid suggestion to be valid (with toIdentifier), we could detect that it's invalid (with isValidIdentifier) and then provide that feedback to the LLM, asking it to give a new suggestion; probably with some max retry limit; after which we could fall back to using the invalid suggestion run through toIdentifier, or log a warning and leave it un-renamed or similar.

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/117#issuecomment-2381046017