[Question] Yak and A.I. ?

gab12 commented 8 months ago

Hello,

I've been using yak for years and love the work that's been done so far. Well done. However, IT is evolving and so are the tools. The AI revolution is now here.

I did a test with a well-known AI, and passed some code obfuscated by yak to the AI. It gave me back the original code in a matter of seconds.

I think that with AI, high-level obfuscation is unfortunately obsolete, because I can't see any system that can defeat AI.

What do you think?

sedimentation-fault commented 7 months ago

Can you please add some more detail on this? Which "well-known AI" was it and what code you gave it to de-obfuscate? I mean, maybe it can resolve small, simple pieces of code, but does it also manage to de-obfuscate complex programs with hundreds, or even thousands of lines?

And how did it know that, say, your obfuscated constant kvVIgcZOKDckqVxb was, say, LABEL_ORDERS? That's impossible, I dare say.

sedimentation-fault commented 7 months ago

And how did it know that, say, your obfuscated constant kvVIgcZOKDckqVxb was, say, LABEL_ORDERS? That's impossible, I dare say.

It may be very well possible, if you forgot to remove yakpro-po comments. I always have a line like this in my workflow, at the end of the obfuscation process:

find obfuscated-code-dir/ -type f | grep -E '(\.php|\.css)' | xargs -P 5 -n 10 sed -i -f sed-script-yakpro-po-remove-comments

This finds all PHP/CSS files in the obfuscated-code-dir directory and runs sed on them in a parallel way (using xargs -P) with the sed script sed-script-yakpro-po-remove-comments. I attach the sed script for your convenience: sed-script-yakpro-po-remove-comments.txt

# NOTE: For this sed script to work, you must
# change config.php of yakpro-po to delimit its comments with
# 
# /*  |--------------------------------------------------|
# 
# and
# 
#     |--------------------------------------------------| */
# 
# instead of the original 
# 
# /*   __________________________________________________
# 
# and
# 
# */

If you don't do the above and leave yakpro-po comments in your obfuscated code, it is a matter of one command to de-obfuscate some name, with the help of the --whatis option:

yakpro-po --config-file path-to/yakpro-po.cnf path-to/original-dir -o obfuscated-code-dir --whatis kvVIgcZOKDckqVxb

Of course, here you need the original , unobfuscated directory, but maybe someone wrote a script to bypass this and just de-obfuscate with the help of yakpro-po comments - and that is what the AI tool might have used.

I am curious if your AI tool can de-obfuscate a program with comments removed as above.

sed-script-yakpro-po-remove-comments.txt

gab12 commented 7 months ago

For information, the AI I use is ChatGPT 3.5.

Just ask it to unobfuscate the code and it does so. It can happen that obfuscation makes lose the information of the name of the original variables. You therefore need to complete the request by specifying that the variables should also be made readable.

It will then understand the code and restore (or rewrite) the variables in a very clear and readable way. I've run several tests and the results are impressive!

Having spoken to other people who have tried it with other obfuscation tools, the result is impeccable. Personally, I use yakpro's default configuration, I haven't tried deleting comments if it doesn't do it by default. But I don't think that's a problem. I invite you to test it on your code :)

Note: I'm not comfortable explaining the deobfusquer method in detail. I'll edit the post in a few days to remove the explanation. The idea is to make people aware of the fact that obfuscation is no longer a method of securing one's code and can be broken in a few seconds by anyone, at least in my opinion... and I think that low-level ofuscation tools like zend guard should still resist, but I've never tested it.

sedimentation-fault commented 7 months ago

low-level ofuscation tools like zend guard should still resist, but I've never tested it.

Well, no, not really. First of all, zend guard uses encryption to encrypt the code. The problem is, PHP has somehow to decrypt the code to run it. Now, if your PHP engine can decrypt something, then you can too! All you need is to re-compile PHP, changing the source code at some point to print what it has decrypted, just before it executes it. That's how you will find services that will decrypt your code for a few $$.

Therefore, if we have a chance to protect PHP source code, obfuscation is the way to go, not encryption.

It will then understand the code and restore (or rewrite) the variables in a very clear and readable way.

Again, you are not being clear enough. To stay in my example above, does it change kvVIgcZOKDckqVxb to your original name, say, LABEL_ORDERS? Or does it change it to something simpler, say L? It is easy to find all convoluted names in some code and change them to something simpler, that is: change kvVIgcZOKDckqVxb to a one-letter constant L. Then you have to understand the role of constant L in the code...Unless, ChatGPT "understands" that L is a constant that has to do with orders and uses a "meaningful" name, e.g. L_ORDERS or something.

Plus: yakpro-po can change the order of execution. I use it and the obfuscated code is full of goto's! Example:

        if (!(qbq1_k0qvpRaKW3_ && isset($_SESSION["\x44\x45\x42\125\107"]) && $_SESSION["\104\x45\102\125\x47"])) {
            goto Ai0911ru0LZDUiLb;
        }
        L0AqwMuafHCO0g1G::TzvDpfbljJrnOMwE($QY71R44X2_TIBqJA);
        Ai0911ru0LZDUiLb:
        if (!(T_huakxp5uXdH4Ab && isset($_SESSION["\x44\105\102\125\107"]) && $_SESSION["\x44\x45\x42\125\x47"] && $dyILjT9pOSH28I62 != array())) {
            goto TBSd5sIHxW5iIVlb;
        }

Works exactly as the original, but good luck understanding the logic! :-)

P.S. Feel free to post the unobfuscated version by ChatGPT of the above - it is not anything special. Just curious. I am too busy to do it myself right now...

pk-fr commented 7 months ago

to be very honest :

string obfuscation is pseudo obfuscation and can be easily reverted.
statements shuffling with gotos : I've heard of someone having coded unshuffling...
one line code : can also be reverted...

What cannot forever be reverted because information is lost :

variable, function, class, method, etc. names !
comments...

It seems enough for me to say that obfuscation is the best way to protect code... for big projects with hundreds of source code files, it will make it quite impossible for someone to understand the code

X25guru commented 7 months ago

Any chance this will be updated so it can be used with modern php scripts? php-8 is here to stay, unfortunately

carlosmintfan commented 4 months ago

The new GPT4o model (maybe it would also work with gpt3) works AWFULLY GOOD in resolving gotos, at least in small code snippets. https://chatgpt.com/share/2e891b5c-6ad0-46a8-984b-4bbb414b7a17 (From a file from a learning platform developed for a friend of mine whose name is Emmanuel). It's... yeah. The AI, a machines, makes humans understand stuff that should only be understandable by machines.

pk-fr / yakpro-po

[Question] Yak and A.I. ? #121