jehna / humanify

Deobfuscate Javascript code using ChatGPT
MIT License
1.67k stars 70 forks source link

Suggestions : Alternative Models, Batch and Auto-renaming #84

Open neoOpus opened 2 months ago

neoOpus commented 2 months ago

Hi Jesse!

I greatly appreciate your ongoing work on this deobfuscation tool. I'm currently trying to reverse a minified, obfuscated extension that has been abandoned by its authors. Unfortunately, the JavaScript files are quite large, and despite using tools like Gemini, I hit a snag at about 7% or just not showing any progress for several hours equally by using the local models on my old machine it didn't show any progress for several hours even with the --verbose. So I am not sure what is the problem other than having an error about "Punycode" module being deprecated.

I know the potential of your tool is significant, and with a few enhancements, it could serve even more users looking to modify and maintain older extensions. If you consider integrating additional models like Perplexity API ($5 free tier for Pro every month), Meta Llama 3.1 (free) or using Groq API (super-fast and support several models), Claud 3.5 Sonnet (Super good at code), it will really open up possibilities for those of us dealing with complex files.

This is important because the operation takes a long time, and while I could create a batch file, it would be challenging to maintain for ongoing projects that require regular deobfuscation of updated JS files from any other Chrome extension I am trying to modify to add some fixes or features... Just to see the differences in comparison tools and dismissing those who are just due to the model variable naming variations can take hours let alone looking for necessary modifications (patching). I hope this makes sense.

Suggested To-Do List: Improving the Deobfuscation Tool

  1. Explore New Model Integrations

    • Consider adding support for models like Meta Llama 3.1, Groq API, HuggingFace, and Claude. These could enhance performance and provide more versatility for users.
  2. Implement Multiple File Input

    • Allow users to add several JavaScript files via the command line.
    • Enable recursive folder lookup for JavaScript files to streamline the process.
  3. Auto-Rename Deobfuscated Files

    • Introduce an option to auto-rename deobfuscated files or save them in a separate folder with different names (ideally retaining the original name or with a defined prefix/suffix). This will help users keep track of versions.
  4. Enhance Performance

    • Investigate options to improve processing speed and efficiency, especially for larger JavaScript files.

Implementing the suggestions above could make a substantial difference in usability and performance. Thank you for considering these enhancements and keep up the great work—your efforts are genuinely invaluable to the community!

jehna commented 2 months ago

Hey! Very good suggestions, thank you for those.

A quick reply about the models: This library relies heavily on a specific API feature: forcing the output, which seems not to be available on most cloud providers I've looked at. I'd guess that this is changing in the near future, as OpenAI juat recently introduced the json output mode and I'm sure many will follow.

At the moment Groq and LLama3.1 (through Azure at least) are not compatible with the requirements of humanify. Claude seems to not have it either (not sure if choosing the tool in advance would work). I'll add them in as soon as they start supporting grammars or forced json output. Meta.ai is not yet available for my country, is there other good places to find a hosted version?

neoOpus commented 1 month ago

Ah! I understand for the models, thanks for the clarification, I will keep an over that feature (JSON output) and I will also send them requests in every communication channel they have in order to make them know that it is a popular demand (using their own LLMs to formulate the messages of course, hehe)

Meta.ai is not yet available for my country, is there other good places to find a hosted version?

Please allow me some time and I will get back to you with some options, meanwhile you can try to use a VPN (ProtonVPN Free recommended) ... I will verify first that the API that I will suggest allow the requirement for your script to work (right now I think Perplexity can be a good option but there are so many so far and I will find one that will work best for everyone)

neoOpus commented 1 month ago

I appreciate your willingness to strengthen this project. I’m currently focused on understanding its internal workings to develop an effective workflow, as I see great potential here. I have some suggestions, but I want to ensure they’re feasible. One idea is to leverage multiple LLMs by chaining their outputs, where the result of one serves as the input for another to perform transformations or adapt to the required format. This approach could reduce API usage for a single service and maximize the benefits of each one’s free tier, which is important since not everyone can afford commercial use.

neoOpus commented 1 month ago

I am still looking for alternatives that are free and I think that Mixtral could be the one (https://docs.mistral.ai/capabilities/json_mode/)... Codestral specifically could be the one (but I maybe mistaken) https://mistral.ai/news/codestral/

Codestral Use Codestral via your favorite Code completion tool for free.