Closed abmyii closed 2 years ago
I'll submit a PR to whichever repo if this solution is acceptable. Also, I'd appreciate any insight to the questions I had!
Thanks for looking at, reporting and investigating. I am a little short of time right now, but I'll be going over this in detail and will give detailed information and feedback when I have time which I hope will be soon.
No problem, thank you very much for your quick reply and for this awesome program!
I just tried applying the change you suggested and while that no longer throws a KeyError
exception, I am not getting a parse of the instructions and therefore no dcompilation.
If you are getting a decompilation, then attach the output of running uncompyle6
using options -agT
.
Otherwise we can start the discussion here, but let's continue this in decompyle3
because that will be the easier place to fix and once that's done the fix can be backported here.
It seemed like an interesting and simple-ish issue so I decided to investigate!
That's the spirit! I applaud you. Alas after looking at this, it looks like it is not as simple as we would have liked...
I don't understand why this
JUMP
doesn't have the "base" 4416 key,
This program is huge. A disassembly of it is about 2.7K lines with 2K instructions in the main routine. A disassembly will show the that the instruction is:
2468 JUMP_ABSOLUTE 4416 (to 4416)
And to be able to get the large number 4416 as the operand value, an EXTENDED_ARG
instruction needs to precede that instruction. It looks like this:
>> 2466 EXTENDED_ARG 17 (4352)
The "extended arg" instructions were rare in 2.7, but are now very common in Python 3.6 and above because the word size was reduced from 1-3 bytes to a fixed 2 bytes, one byte for an operand is too small especially with larger programs. The EXTENDED_ARG
instruction wreaks havoc on a grammar based parsing program like uncompyle6
or decompye3
because now for every instruction there are possibly many forms of that instruction: the one without EXTENDED_ARG
and those with one or more of them.
So what's done is we try to fold instructions with EXTENDED_ARG
into one instruction. Of course the internal Python bytecode instruction object is not limited to one byte for jump addresses, so it can easily fit in say 4516 rather than have to represent this as 4352 in one instruction and 64 in the next. Also if we were not to combine the two numbers, it would wreak havoc on logic when we are trying to figure out where something jumps to.
But now, if we do this what should we call the offset of just combined instructions? The offset is just a string of the first EXTENDED_ARG offset and a string of the non-EXTENDED_ARG offset. Here this the offset value is 2466_2468
I hope this answers the questions here. For what should be done, and moving towards addressing this let's move the discussion to decompiyle3 where I'll post the remainder.
It seems I am running against a similar problem with another piece of bytecode, this time a Python 2.7 one, using uncompyle6 3.9.0a1
(source code from GitHub, current master
branch).
By using @abmyii's trick to edit uncompyle6
source code to add debug stanzas, I managed to isolate the problematic instruction from the disassembled bytecode:
65587 JUMP_ABSOLUTE (to 65540)
The instructions were part of a for
loop:
[...]
3796: >> 65533 SETUP_LOOP (to 65591)
65536 LOAD_GLOBAL (data)
65539 GET_ITER
>> 65540 FOR_ITER (to 65590)
65543 STORE_FAST (element)
3797: 65546 LOAD_FAST (element)
65549 LOAD_CONST (1)
65552 BINARY_SUBSCR
65553 LOAD_FAST (self)
65556 LOAD_ATTR (marker)
65559 COMPARE_OP (==)
65562 EXTENDED_ARG (65536)
65565 POP_JUMP_IF_FALSE (to 65584)
3798: 65568 LOAD_GLOBAL (data)
65571 LOAD_ATTR (remove)
65574 LOAD_FAST (element)
65577 CALL_FUNCTION (1 positional, 0 named)
65580 POP_TOP
65581 JUMP_FORWARD (to 65584)
>> 65584 EXTENDED_ARG (65536)
65587 JUMP_ABSOLUTE (to 65540)
>> 65590 POP_BLOCK
3799: >> 65591 SETUP_LOOP (to 65649)
[...]
The location where the problem declares itself, on a JUMP_ABSOLUTE
, is preceded by an EXTENDED_ARG
instruction.
IIUC, per documentation, EXTENDED_ARG
's argument is supposed to contain a 2-byte value extending the value of the subsequent instruction, here JUMP_ABSOLUTE
.
I was surprised to find the EXTENDED_ARG
's value is exactly one bit over the maximum value 2 bytes can hold.
I found a code section in rocky/python-xdis
which might be responsible for such a value, but the behaviour eludes me.
Of course the problem does not appear if there is no need for that EXTENDED_ARG
, ie if the jump target instruction # is small enough to be contained into 2 bytes.
I was able to put together a few lines focused on that code section:
Thanks - should be fixed in 62760eb5
As for the xdis sequence decoding I don't see anything wrong with that. instructions abstract out EXTENDED_ARGS
which is a limitation of the bytecode format. That code is in service of that to compute the offset value of of the instruction.
Description
Attempting to decompile a tkinter script which was extracted from a PyInstaller executable. I got this error:
It seemed like an interesting and simple-ish issue so I decided to investigate! With this code below line 101, I found the problem.
https://github.com/rocky/python-uncompyle6/blob/451f0b55bba2ccb3b33611aa60adb31aa31e6bf9/uncompyle6/scanners/scanner38.py#L101
This was the output:
I noticed that there was no
4416
key - all of the keys had_...
values. After a bit more digging I saw that it was being added by these lines:https://github.com/rocky/python-uncompyle6/blob/451f0b55bba2ccb3b33611aa60adb31aa31e6bf9/uncompyle6/scanners/scanner37base.py#L329-L340
I don't understand why this
JUMP
doesn't have the "base"4416
key, but I found a simple solution. I printed some otherJUMP
values and noticed that in every case - regardless of 1 or 3+ jumps with the same offset, thejump_back_index
isself.offset2tok_index[last_index] - 1
- so in this case, the last4416
jump is'4416_4418'
and thusjump_back_index = self.offset2tok_index['4416_4418'] - 1
. I don't understand why, however. So, in short, I changed the code to get thejump_back_index
in this way, and it fixed the problem:From: https://github.com/rocky/python-uncompyle6/blob/451f0b55bba2ccb3b33611aa60adb31aa31e6bf9/uncompyle6/scanners/scanner38.py#L102
To:
And that fixes this problem.
This issue also applies to https://github.com/rocky/python-decompile3.
How to Reproduce
A link to the
pyc
file: https://gofile.io/?c=MV8jCW