Open Niocas opened 2 months ago
Here is one of the files I am trying to decompile.
I added it to the pythonb_3_12.cpp now, But now the output look like this when executing: " pycdc item_data_2.pyc". Any ideas?
pycdas item_data_2.pyc outputs the following:
there is opcode 166 in your pyc - it is not legal one, from cpython include/opcode.h: (Python 3.12)
the direct answer here is: fixing <INVALID>
from pycdc
requires implementing a decompilation strategy in ASTree.cpp
for the specific opcode/instruction, which is non-trivial. you can add the opcode to the case statement in ASTree.cpp
just to get the tool to be quiet but it often results in incorrect/incomplete python output.
examples of opcodes blocking successful python code generation (from "OH" pycs) include:
Since getting an ultra-trivial merge for a PR proved impossible (#511 - nothing more than "testing the waters" here) I forked and stopped trying to work with pycdc devs, based on the title this message is coming from that fork.. that means you're also going to battle the fact that the original repo doesn't have complete opcode maps (pycdas doesn't produce 100% correct results for 3.11 nor 3.12) and you may be asking devs to implement/investigate something they haven't support for yet in the main repo.
for example, according to pycdc main repo "166" is not a valid opcode, but we can see that it is "UNPACK_SEQUENCE_TUPLE" from cpython source code.
#define UNPACK_SEQUENCE_TUPLE 166
you can see the response from @greenozon illustrating the problem you are going to face here.
i'm trying to be kind about this problem. the fact is we have binaries in the wild which contain opcodes which the pycdc project denies exist.
as for the code you're reversing, in most cases the modules containing bindict
have no useful code, they contain a bindict and a call out to a native bindict
module that i've not been able to locate (possibly is packed inside the 50MB main exe, it doesn't exist anywhere in the pyc's) -- the bindict format is essentially a table similar to NXFNs along with a trailing binary blob (which is not consistent between bindicts, which means it must be contextual.) to illustrate what i mean, consider this pycdas result from another bindict file:
0 RESUME 0
2 LOAD_CONST 0: 0
4 LOAD_CONST 1: None
6 IMPORT_NAME 0: bindict
8 STORE_NAME 0: bindict
10 PUSH_NULL
12 LOAD_NAME 0: bindict
14 LOAD_ATTR 0: bindict
34 LOAD_CONST 2: b'\x01\x00\x00\x00\x00\x00\x00\x00\x13\x00\x00\x00abnormal_item_state\x0c\x00\x00\x00\x00\x01\x00\x00\x01\x96\x05\x02v\x01\x0b\x01\x0f\x17\xfd8\x18\x00\x00\x00\x89\xc0\x95\x12\t\x00'
36 UNPACK_SEQUENCE_TUPLE 1
40 CALL 1
50 STORE_NAME 1: data
52 LOAD_CONST 1: None
54 RETURN_VALUE
you can see this is basically just calling bindict.bindict(...)
passing in the constant bytes/string shown in the disasm. this is basically the same in all files containing bindict data.
the approximate py output from pycdc
(if it were actually implemented rather than being denied) would look something like this:
# WIP opcode: UNPACK_SEQUENCE_TUPLE (bytecode=A6h) at position 36.
# Source Generated with Decompyle++
# File: abnormal_capture_rate_data.do.pyc (Python 3.12)
import bindict
data = bindict.bindict(b'\x07\x00\x00\x00\x00\x00\x00\x00\x10\x00\x00\x00 \x00\x00\x000\x00\x00\x00?\x00\x00\x00O\x00\x00\x00_\x00\x00\x00h\x00\x00\x00settlement_rate2settlement_rate4settlement_rate3max_capture_nummust_succeed_numsettlement_rate1init_rateG\x01\x00\x00\x00\x00\x02\x02\x01\x01\x06\x03\x04\x05\n\x06\x00\x12\x00"\x02\x12\x02"\x01\x12\x01"\x06"\x03\x01\x04\x01\x05"\x96\x0e*\xfc\xa9\xf1\xd2Mb`?\xfa~j\xbct\x93h?{\x14\xaeG\xe1zt?\xfc\xa9\xf1\xd2MbP?\x04c\xfc\xa9\xf1\xd2MbP?\x96\x0e\x15\x00\x00\x80?\x00\x00\x00\x00\x00\x00\x00\x00ffffff\xe6?\x02\x02\x9a\x99\x99\x99\x99\x99\xe9?\x96\x0e*{\x14\xaeG\xe1z\x84?\xb8\x1e\x85\xebQ\xb8\x8e?\x9a\x99\x99\x99\x99\x99\x99?\xfa~j\xbct\x93h?\x04c{\x14\xaeG\xe1zt?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x00\x00\x00\x00\x9a\x99\x99\x99\x99\x99\xd9?\x03c333333\xe3?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xa9?333333\xb3?\x00\x00\x00>{\x14\xaeG\xe1zt?\x04c\x9a\x99\x99\x99\x99\x99\x99?\x96\x0e*333333\xe3?\x9a\x99\x99\x99\x99\x99\xe9?\xcd\xcc\xcc\xcc\xcc\xcc\xec?\x9a\x99\x99\x99\x99\x99\xc9?\x04c\x9a\x99\x99\x99\x99\x99\xd9?\x96\x0e\x1a\x9a\x99\x99\x99\x99\x99\xc9?333333\xd3?\x00\x00\x00?\x9a\x99\x99\x99\x99\x99\xa9?\x04c\x9a\x99\x99\x99\x99\x99\xb9?f\x0b\x07\x00\x00\x00\x00\x93\x01\x00\x00\x1bc\r4\x97\x01\x00\x006\xc6\x1ah\x8f\x01\x00\x00\xc99\xe5\x97\x85\x01\x00\x00R)(\x9c\x88\x01\x00\x00\xe4\x9c\xf2\xcb\x8b\x01\x00\x00m\x8c5\xd0\x82\x01\x00\x00\x11\x07$\x01\x02Q\x11\x05r\x01\x01\x9f\x01\x11\x03\xc8\x01\x01\x00\xf1\x01\x11\x01\x9e\x02\x00')
anyway, the short answer is resolving the issue requires updating ASTree.cpp (after fixing the incomplete opcode maps.)
@greenozon you might find this of interest:
https://github.com/wilson0x4d/pycdc/blob/wip/bytes/python_3_11.cpp
https://github.com/wilson0x4d/pycdc/blob/wip/bytes/python_3_12.cpp
i see no reason to not have entries for any opcode appearing in official cpython, it actually works against pycdc
maintainers and its end-users trying to figure out what to keep and what to remove, and it causes no harm in having entries that cpython's compile(...)
would not produce, the mere fact the opcode has representation in cpython source code at any point during the lifetime of a given version/branch is sufficient reason to be including them (IMHO)
i also have ASTree implementation code for a half dozen ops not pushed to my wip
branch. would love if i could work with people that understand how to work with the ast stack and frame logic better than i do.
Unsupported opcode: (bytecode=A6h) at position 36.
I am trying to decompile a python 3.12 .pyc file. But it fails for nearly all files at bytecode "A6h". How can I possibly fix that? I wrote python script with python 3.12 and imported opcode to print all opcodes, but it seems like that are not all of them? What am I missing here, how can I fix the decompiling process?