rocky / python-uncompyle6

A cross-version Python bytecode decompiler
GNU General Public License v3.0
3.79k stars 413 forks source link

`JUMP_ABSOLUTE` decompilation error #310

Closed abmyii closed 2 years ago

abmyii commented 4 years ago

Description

Attempting to decompile a tkinter script which was extracted from a PyInstaller executable. I got this error:

Traceback (most recent call last):
  File "~/.local/bin/uncompyle6", line 10, in <module>
    sys.exit(main_bin())
  File "~/.local/lib/python3.6/site-packages/uncompyle6/bin/uncompile.py", line 194, in main_bin
    **options)
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 327, in main
    do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 225, in decompile_file
    do_fragments=do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 144, in decompile
    co, out, bytecode_version, debug_opts=debug_opts, is_pypy=is_pypy
  File "~/.local/lib/python3.6/site-packages/uncompyle6/semantics/pysource.py", line 2531, in code_deparse
    co, code_objects=code_objects, show_asm=debug_opts["asm"]
  File "~/.local/lib/python3.6/site-packages/uncompyle6/scanners/scanner38.py", line 106, in ingest
    jump_back_index = self.offset2tok_index[jump_target] - 1
KeyError: 4416

It seemed like an interesting and simple-ish issue so I decided to investigate! With this code below line 101, I found the problem.

https://github.com/rocky/python-uncompyle6/blob/451f0b55bba2ccb3b33611aa60adb31aa31e6bf9/uncompyle6/scanners/scanner38.py#L101

if token.attr == 4416:
    print()
    print(vars(token))
    print({i: self.offset2tok_index[i] for i in self.offset2tok_index if '4416' in str(i)})

This was the output:

{'kind': 'JUMP_ABSOLUTE', 'has_arg': True, 'attr': 4416, 'pattr': 4416, 'offset': 2428, 'linestart': None, 'opc': <module 'xdis.opcodes.opcode_38' from '~/.local/lib/python3.6/site-packages/xdis/opcodes/opcode_38.py'>, 'op': 113}
{'4416_0': 2222, '4416_1': 2223, '4416_4418': 2224}

I noticed that there was no 4416 key - all of the keys had _... values. After a bit more digging I saw that it was being added by these lines:

https://github.com/rocky/python-uncompyle6/blob/451f0b55bba2ccb3b33611aa60adb31aa31e6bf9/uncompyle6/scanners/scanner37base.py#L329-L340

I don't understand why this JUMP doesn't have the "base" 4416 key, but I found a simple solution. I printed some other JUMP values and noticed that in every case - regardless of 1 or 3+ jumps with the same offset, the jump_back_index is self.offset2tok_index[last_index] - 1 - so in this case, the last 4416 jump is '4416_4418' and thus jump_back_index = self.offset2tok_index['4416_4418'] - 1. I don't understand why, however. So, in short, I changed the code to get the jump_back_index in this way, and it fixed the problem:

From: https://github.com/rocky/python-uncompyle6/blob/451f0b55bba2ccb3b33611aa60adb31aa31e6bf9/uncompyle6/scanners/scanner38.py#L102

To:

offset_instances = [inst for inst in self.offset2tok_index if str(jump_target) in str(inst)]
jump_back_index = self.offset2tok_index[offset_instances[-1]] - 1

And that fixes this problem.

This issue also applies to https://github.com/rocky/python-decompile3.

How to Reproduce

$ uncompyle6 Main.pyc
Traceback (most recent call last):
  File "~/.local/bin/uncompyle6", line 10, in <module>
    sys.exit(main_bin())
  File "~/.local/lib/python3.6/site-packages/uncompyle6/bin/uncompile.py", line 194, in main_bin
    **options)
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 327, in main
    do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 225, in decompile_file
    do_fragments=do_fragments,
  File "~/.local/lib/python3.6/site-packages/uncompyle6/main.py", line 144, in decompile
    co, out, bytecode_version, debug_opts=debug_opts, is_pypy=is_pypy
  File "~/.local/lib/python3.6/site-packages/uncompyle6/semantics/pysource.py", line 2531, in code_deparse
    co, code_objects=code_objects, show_asm=debug_opts["asm"]
  File "~/.local/lib/python3.6/site-packages/uncompyle6/scanners/scanner38.py", line 106, in ingest
    jump_back_index = self.offset2tok_index[jump_target] - 1
KeyError: 4416
$

A link to the pyc file: https://gofile.io/?c=MV8jCW

abmyii commented 4 years ago

I'll submit a PR to whichever repo if this solution is acceptable. Also, I'd appreciate any insight to the questions I had!

rocky commented 4 years ago

Thanks for looking at, reporting and investigating. I am a little short of time right now, but I'll be going over this in detail and will give detailed information and feedback when I have time which I hope will be soon.

abmyii commented 4 years ago

No problem, thank you very much for your quick reply and for this awesome program!

rocky commented 4 years ago

I just tried applying the change you suggested and while that no longer throws a KeyError exception, I am not getting a parse of the instructions and therefore no dcompilation.

If you are getting a decompilation, then attach the output of running uncompyle6 using options -agT.

Otherwise we can start the discussion here, but let's continue this in decompyle3 because that will be the easier place to fix and once that's done the fix can be backported here.

It seemed like an interesting and simple-ish issue so I decided to investigate!

That's the spirit! I applaud you. Alas after looking at this, it looks like it is not as simple as we would have liked...

I don't understand why this JUMP doesn't have the "base" 4416 key,

This program is huge. A disassembly of it is about 2.7K lines with 2K instructions in the main routine. A disassembly will show the that the instruction is:

            2468 JUMP_ABSOLUTE          4416 (to 4416)

And to be able to get the large number 4416 as the operand value, an EXTENDED_ARG instruction needs to precede that instruction. It looks like this:

         >> 2466 EXTENDED_ARG             17 (4352)

The "extended arg" instructions were rare in 2.7, but are now very common in Python 3.6 and above because the word size was reduced from 1-3 bytes to a fixed 2 bytes, one byte for an operand is too small especially with larger programs. The EXTENDED_ARG instruction wreaks havoc on a grammar based parsing program like uncompyle6 or decompye3 because now for every instruction there are possibly many forms of that instruction: the one without EXTENDED_ARG and those with one or more of them.

So what's done is we try to fold instructions with EXTENDED_ARG into one instruction. Of course the internal Python bytecode instruction object is not limited to one byte for jump addresses, so it can easily fit in say 4516 rather than have to represent this as 4352 in one instruction and 64 in the next. Also if we were not to combine the two numbers, it would wreak havoc on logic when we are trying to figure out where something jumps to.

But now, if we do this what should we call the offset of just combined instructions? The offset is just a string of the first EXTENDED_ARG offset and a string of the non-EXTENDED_ARG offset. Here this the offset value is 2466_2468

I hope this answers the questions here. For what should be done, and moving towards addressing this let's move the discussion to decompiyle3 where I'll post the remainder.

Berbe commented 2 years ago

Description

It seems I am running against a similar problem with another piece of bytecode, this time a Python 2.7 one, using uncompyle6 3.9.0a1 (source code from GitHub, current master branch).

Encountered error ```python Traceback (most recent call last): File "/home/user/venv/uncompyle6/bin/uncompyle6", line 11, in load_entry_point('uncompyle6', 'console_scripts', 'uncompyle6')() File "/home/user/python-uncompyle6/uncompyle6/bin/uncompile.py", line 197, in main_bin result = main(src_base, out_base, pyc_paths, source_paths, outfile, File "/home/user/python-uncompyle6/uncompyle6/main.py", line 305, in main deparsed = decompile_file( File "/home/user/python-uncompyle6/uncompyle6/main.py", line 216, in decompile_file decompile( File "/home/user/python-uncompyle6/uncompyle6/main.py", line 143, in decompile deparsed = deparse_fn( File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1376, in code_deparse deparsed.gen_source( File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1164, in gen_source self.text = self.traverse(ast, is_lambda=is_lambda) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 451, in traverse self.preorder(node) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder super(SourceWalker, self).preorder(node) File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 117, in preorder self.preorder(kid) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder super(SourceWalker, self).preorder(node) File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 110, in preorder func(node) File "/home/user/python-uncompyle6/uncompyle6/semantics/n_actions.py", line 192, in n_classdef self.build_class(subclass_code) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1134, in build_class self.gen_source(ast, code.co_name, code._customize) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 1164, in gen_source self.text = self.traverse(ast, is_lambda=is_lambda) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 451, in traverse self.preorder(node) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder super(SourceWalker, self).preorder(node) File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 117, in preorder self.preorder(kid) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder super(SourceWalker, self).preorder(node) File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 112, in preorder self.default(node) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 872, in default self.template_engine(table[key.kind], node) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 770, in template_engine self.preorder(node[index]) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 429, in preorder super(SourceWalker, self).preorder(node) File "/home/user/venv/uncompyle6/lib/python3.9/site-packages/spark_parser/ast.py", line 110, in preorder func(node) File "/home/user/python-uncompyle6/uncompyle6/semantics/n_actions.py", line 1017, in n_mkfunc self.make_function(node, is_lambda=False, code_node=code_node) File "/home/user/python-uncompyle6/uncompyle6/semantics/pysource.py", line 543, in make_function make_function2(self, node, is_lambda, nested, code_node) File "/home/user/python-uncompyle6/uncompyle6/semantics/make_function2.py", line 85, in make_function2 code = Code(code, self.scanner, self.currentclass) File "/home/user/python-uncompyle6/uncompyle6/scanner.py", line 101, in __init__ self._tokens, self._customize = scanner.ingest(co, classname, show_asm=show_asm) File "/home/user/python-uncompyle6/uncompyle6/scanners/scanner2.py", line 420, in ingest j = self.offset2inst_index[offset] KeyError: 65587 ```

Investigation

By using @abmyii's trick to edit uncompyle6 source code to add debug stanzas, I managed to isolate the problematic instruction from the disassembled bytecode:

65587 JUMP_ABSOLUTE        (to 65540)

The instructions were part of a for loop:

[...]
3796:     >> 65533 SETUP_LOOP           (to 65591)
            65536 LOAD_GLOBAL          (data)
            65539 GET_ITER
         >> 65540 FOR_ITER             (to 65590)
            65543 STORE_FAST           (element)

3797:        65546 LOAD_FAST            (element)
            65549 LOAD_CONST           (1)
            65552 BINARY_SUBSCR
            65553 LOAD_FAST            (self)
            65556 LOAD_ATTR            (marker)
            65559 COMPARE_OP           (==)
            65562 EXTENDED_ARG         (65536)
            65565 POP_JUMP_IF_FALSE    (to 65584)

3798:        65568 LOAD_GLOBAL          (data)
            65571 LOAD_ATTR            (remove)
            65574 LOAD_FAST            (element)
            65577 CALL_FUNCTION        (1 positional, 0 named)
            65580 POP_TOP
            65581 JUMP_FORWARD         (to 65584)
         >> 65584 EXTENDED_ARG         (65536)
            65587 JUMP_ABSOLUTE        (to 65540)
         >> 65590 POP_BLOCK

3799:     >> 65591 SETUP_LOOP           (to 65649)
[...]

The location where the problem declares itself, on a JUMP_ABSOLUTE, is preceded by an EXTENDED_ARG instruction. IIUC, per documentation, EXTENDED_ARG's argument is supposed to contain a 2-byte value extending the value of the subsequent instruction, here JUMP_ABSOLUTE.

I was surprised to find the EXTENDED_ARG's value is exactly one bit over the maximum value 2 bytes can hold. I found a code section in rocky/python-xdis which might be responsible for such a value, but the behaviour eludes me.

Of course the problem does not appear if there is no need for that EXTENDED_ARG, ie if the jump target instruction # is small enough to be contained into 2 bytes.

Reproduction

I was able to put together a few lines focused on that code section:

Code ```python data = [ [ "a", "b" ], [ "c", "d" ] ] class Test: marker = "b" def test(self): for element in data: if element[1] == self.marker: data.remove(element) test = Test() print(data) test.test() print(data) ```
rocky commented 2 years ago

Thanks - should be fixed in 62760eb5

As for the xdis sequence decoding I don't see anything wrong with that. instructions abstract out EXTENDED_ARGS which is a limitation of the bytecode format. That code is in service of that to compute the offset value of of the instruction.