zrax / pycdc

C++ python bytecode disassembler and decompiler
GNU General Public License v3.0
3.19k stars 615 forks source link

Segmentation fault on lambda decoding #274

Open tarhan opened 2 years ago

tarhan commented 2 years ago

I'm trying to use pycdc on file which apparently contains lambda (not sure what it doing). During decoding it pycdc output Url = (lambda and crashing.
Crashing point within ASTree.cpp:

            fputs("(lambda ", pyc_output);
            PycRef<ASTNode> code = node.cast<ASTFunction>()->code();
            PycRef<PycCode> code_src = code.cast<ASTObject>()->object().cast<PycCode>();

Problem is with casting to ASTObject within code.cast<ASTObject>(). code variable contains ASTComprehension and dynamic_cast creates PycRef<ASTObject*> with null reference. It is understandable since inheritance ASTComprehension -> ASTNode. Where I can learn about correctly reading output of disassembler to understanding how modify decompiler? I could not find detailed description of opcodes and their arguments. But more importantly I could not fully understand hierarchical output of disassembler especially init section of each level.

From current understanding of assembler output here part which pycdc could not understand:

                        416     LOAD_CONST              18: <CODE> <listcomp>
                        418     LOAD_CONST              19: 'main.<locals>.download.<locals>.<listcomp>'
                        420     MAKE_FUNCTION           0
                        422     LOAD_GLOBAL             26: re
                        424     LOAD_METHOD             27: finditer
                        426     LOAD_CONST              20: '^([^#].*ts(?:$|\\?\\S+$))'
                        428     LOAD_FAST               3: m3u8
                        430     LOAD_ATTR               28: m3u8Data
                        432     LOAD_GLOBAL             26: re
                        434     LOAD_ATTR               29: M
                        436     CALL_METHOD             3
                        438     GET_ITER
                        440     CALL_FUNCTION           1
                        442     STORE_FAST              12: fragmentsUrl
                        444     LOAD_FAST               12: fragmentsUrl
                        446     BUILD_LIST              0
                        448     COMPARE_OP              2 (==)
                        450     POP_JUMP_IF_FALSE       462
                        454     LOAD_GLOBAL             30: error
                        456     LOAD_CONST              21: 'Fragments is empty. Kindly report this bug'
                        458     CALL_FUNCTION           1
                        460     POP_TOP
                        462     LOAD_GLOBAL             31: Ripper
                        464     LOAD_DEREF              2: arg
                        466     LOAD_DEREF              0: outputFolderName
                        468     LOAD_FAST               6: outputFileName
                        470     LOAD_FAST               7: subtitlesPath
                        472     LOAD_FAST               12: fragmentsUrl
                        474     LOAD_FAST               3: m3u8

PS: I'm sorry I could not upload *.pyc files to public hosting and paste here link.

ahaensler commented 2 years ago

There is official documentation for opcodes https://docs.python.org/3.11/library/dis.html

Compiler output changes with each new version of python. The compiler is getting more and more efficient. Output is less structured and it is getting harder to decompile with simple rules.

tarhan commented 2 years ago

There is official documentation for opcodes https://docs.python.org/3.11/library/dis.html

I've saw that page. Is there any more detailed and full description?

it is getting harder to decompile with simple rules. That is understandable. Same as for HexRays for IDA Pro.