zrax / pycdc

C++ python bytecode disassembler and decompiler
GNU General Public License v3.0
3.04k stars 593 forks source link

decompile py4 binary files - Bad MAGIC #316

Closed milahu closed 1 year ago

milahu commented 1 year ago
$ pycdc asdf.py4 
Bad MAGIC!
Could not load file asdf.py4

example: .py4 and .pyi files https://talonvoice.com/ https://talonvoice.com/dl/latest/talon-linux.tar.xz (42 MB) talon/resources/python/lib/python3.9/site-packages/talon/

https://github.com/zrax/pycdc/issues/23

Bad MAGIC!

That message indicates that the magic number (first 4 bytes of the pyc file) wasn't recognized

py4 header looks random

$ ls *.py4 | xargs -n1 xxd -g4 -l16
00000000: f327aece 2e1b8254 0bb5f67f 388aff86  .'.....T....8...
00000000: 89c57d01 08b3b720 39f03e0b eb588fd1  ..}.... 9.>..X..
00000000: 254437ec 7354f4e1 0acdba04 4137e606  %D7.sT......A7..
00000000: ea76ba36 6543b889 a2cc0590 cdb48048  .v.6eC.........H
00000000: f80abeb0 b134be8e 921d62b2 3656ee1c  .....4....b.6V..
00000000: ad0255db 39bdb3ba 03a2da8f 12162183  ..U.9.........!.
00000000: ae5d46af 80c58e52 a44617c8 beb00d82  .]F....R.F......
00000000: 7a9ac544 66c8cb4f fd76c078 0895c566  z..Df..O.v.x...f
00000000: fd371ad1 0d824375 edc7f26b 9bb8c15e  .7....Cu...k...^
00000000: da49628e 522c05b9 c40c604d a3b18795  .Ib.R,....`M....
00000000: e65ff54f a1aacf32 3ca85ad7 8cb74b55  ._.O...2<.Z...KU
00000000: ce217e38 ace61573 72aca92c 2d234fca  .!~8...sr..,-#O.
00000000: 5c8761b9 ee0d8312 8684d79c c9720689  \.a..........r..
00000000: b7ed0327 b81f1d40 b8b0b5e2 70eea556  ...'...@....p..V
00000000: c96c537c 3db83cbc 0565fffe 566bf888  .lS|=.<..e..Vk..
00000000: 050b39af 18268880 adf7bc6f c7268486  ..9..&.....o.&..
00000000: 5483a773 2f0fb49a c7def83f 6427a9df  T..s/......?d'..
00000000: 61803524 9402dc73 2fedaeac e6a4ad14  a.5$...s/.......
00000000: b9627e55 df623d11 5a5e060c 6a17d254  .b~U.b=.Z^..j..T
00000000: fae1642a b3cb4165 769d857b 45ea8187  ..d*..Aev..{E...
00000000: c0238c70 6045ee7b 00549c97 4b6b69da  .#.p`E.{.T..Kki.
00000000: 6e944fc4 aa1f771a ef20cf7b 70779220  n.O...w.. .{pw. 
00000000: 89a4f681 b5394e52 daaf2122 666c8a61  .....9NR..!"fl.a
00000000: 9fb1c9d2 43c9ab51 516eee49 0622a70b  ....C..QQn.I."..
00000000: 445b542b 0ac58e0f 760d834c 854c27c6  D[T+....v..L.L'.
00000000: 025fe811 37c43bb5 9daf89cf 75f23436  ._..7.;.....u.46
00000000: 43237105 c857d907 687848df 9449cca3  C#q..W..hxH..I..
00000000: 96252cd9 f971218f 4da26e28 4fb89503  .%,..q!.M.n(O...
$ python3.9 -c "import talon; print(talon)"
<module 'talon' (namespace)>
$ python3.9 -c "import talon; print(talon.__loader__)"
<_frozen_importlib_external._NamespaceLoader object at 0x7f266fde39d0>

trying to run the __init__.pyi file

$ PYTHONPATH=$PYTHONPATH:$PWD firejail python3.9 talon/__init__.pyi
Traceback (most recent call last):
  File "/home/user/src/voice-control/talon-linux/talon/resources/python/lib/python3.9/site-packages/talon/__init__.pyi", line 1, in <module>
    from talon.scripting import Context as Context, Dispatch as Dispatch, Module as Module
ImportError: cannot import name 'Context' from 'talon.scripting' (unknown location)
$ grep Context talon/scripting/__init__.pyi 
from .context import Context as Context

hmm ...

milahu commented 1 year ago

nevermind. this looks like a custom obfuscation of talon

talon obfuscation is probably similar to dropbox

https://news.ycombinator.com/item?id=13848035

the encryption keys are not in the interpreter. The interpreter is patched to not expose co_code and more (to make this memory dumping more difficult; injecting an shared object is a different technique that I used too). It's also patched to use the different opcode mapping and the unmarshalling of pyc files upon loading them. However the key for each pyc file is derived from data strictly in those files themselves. It's pretty clear when you load up the binary in IDA Pro and compare the unmarshalling code with a standard Python interpreter's code

so probably the usual reverse-engineering tools (ida, ghidra, frida, ...) are more useful here

related: