rocky / python-uncompyle6

A cross-version Python bytecode decompiler
GNU General Public License v3.0
3.74k stars 408 forks source link

Tentative fix for #439 #451

Closed andrem-eberle closed 1 year ago

andrem-eberle commented 1 year ago

I altered the code on scanner.py a bit, to separate CONST from NAME vars, in fact it seems it already had this part, but was garbled outside the loop?

rocky commented 1 year ago

This looks good. Since this is so simple, would you do just a little more and include a test for this?

One place a check could be added is to the existing test test/simple_source/expression/05_long_literals.py

Additional code might be:

# Check that we can distinguish names from strings in literal collections, e.g. lists.
# The list has to have more than 4 items go get accumulated in a collection
a = ["y", 'Exception', "x", Exception, "z"]
assert a[1] == "Exception"
assert a[3] == Exception

To update the existing bytecode in test/ run:

./add-test.py -r test/simple_source/expression/05_long_literals.py
andrem-eberle commented 1 year ago

Done and done.

rocky commented 1 year ago

@andrem-eberle Actually I just tried this out and this has problems.

The bottom of the decompilation looks like this:

    a = [
     "'y'", "'Exception'", "'x'", 'Exception', "'z'"]

There are extra quotes. Also, the assert statements were removed by the compiler removed because it figured these are universally true.

To get back the assert put the assert with the == inside a function, and then Python won't be able to figure this out.

As for the extra quotes, that is somewhere in semantics actions.

andrem-eberle commented 1 year ago

This is odd, I can't seem to reproduce the extra quotes here, using latest xdis version and the forked uncompyle6. I am using the pip version of spark, could it be related?

rocky commented 1 year ago

Probably not spark related. And the problem is not in xdis either. I tried disassembling to check.

I also tried running from Python 3.8 (and not 3.8).

When I get a chance I will try to see where I am getting the extra quotes. However in disassembly I can definitely see that the assert statements are gone in the bytecode.

andrem-eberle commented 1 year ago

Ah I see. Sorry I have been focusing on 2.7, forgot about 3.x entirely. It does decompile wrong for code compiled with 3.8. In fact, the else block is decompiled wrong here in 3.8 (the values is inserted into the else), which means the test doesn't fail anyway (even with forced asserts), since it never gets into the else. The Exception test is also inside the else in the decompiled version:

if sys.version < (3, 0):
    pass
else:
    values = {'value1':a + 1,
     'value2':2,
     'value3':3,
...

I will look into the extra quotes.

andrem-eberle commented 1 year ago

Well the cause is the str.__repr__() method, it adds the quotes in 3.8 but not in 2.7 (which is why I wasn't seeing this error in 2.7).

One option is to check for python source version in n_actions.py and use repr() or str(), I did some test here and it worked. Is this kind of solution acceptable for you? If so I will add another pull request.

rocky commented 1 year ago

Well the cause is the str.repr() method, it adds the quotes in 3.8 but not in 2.7 (which is why I wasn't seeing this error in 2.7).

One option is to check for python source version in n_actions.py and use repr() or str(), I did some test here and it worked. Is this kind of solution acceptable for you? If so I will add another pull request.

Yes, this is good - thanks.