rocky / python-xdis

Python cross-version bytecode library and disassembler
GNU General Public License v2.0
282 stars 93 forks source link

Warn or fix whitespace in variable names #58

Open ZetaTwo opened 4 years ago

ZetaTwo commented 4 years ago

Continuing the discussion from here: https://github.com/rocky/python-uncompyle6/issues/312

@rocky directed me to post here instead.

It's possible to easily fool a lot of decompilers by renaming variables to names containing whitespace. For example, running this little script:

$ git clone https://github.com/ZetaTwo/python-obfuscator
$ cd python-obfuscator
$ ./test.sh

Will demonstrate how uncompyle6 can be tricked into decompiling the bytecode into valid code that does not reflect the actual bytecode.

My proposal is to add a small check that will at least warn about the presence of variables with invalid names and possinly even replace the invalid characters with for example "_".

I'm willing to make a PR with these changes but I would first just like to hear if I'm in the right place and if you have opinions on where/how this should be implemented?

rocky commented 4 years ago

Thanks for undertaking this. You are in the right place. We need more people helping out.

Some thoughts. First, there is something that needs to be done on the API side which is what the disassemblers would use. This relevant code is in xdis/main.py. And then adding a CLI option on for the standanalone pydisasm command. That is in xdis/bin/pydisasm.py.

There might be an option to warn but not fix and another setting to fix.

However blindlly replacing invalid characters with underscore probably won't be satisfying, because there could be another variable with that name. Since this kind of mapping is many to one (two distinct invalid characters map to the same character, underscore) you are even likely to get duplicates that way. So checking for existence of the variable before creating a name is desirable. Adding increasing number suffixes (and checking after the suffix is added) is one way to ensure uniqueness.

For completeness, I'll note that as far as assembly goes, invalid names is okay, just as it is in bytecode, because the interpreter doesn't use those names, just the index into the variable name array. But when you go say back to the source code, then the fact that two names are the same when they were different will cause a problem either in understanding the problem or running it from source again.

If you run into trouble or have questions, just ask.

ZetaTwo commented 4 years ago

Ok, I submitted a pull request and some questions regarding the fix. Please have a look.