rocky / python-xdis

Python cross-version bytecode library and disassembler
GNU General Public License v2.0
294 stars 95 forks source link

Dissasembling failing in xasm format #93

Open Vaipex opened 2 years ago

Vaipex commented 2 years ago

Hi,

I'm currently trying to extract the bytecode, edit a few strings and assemble it back to a .pyc file. Pydisasm without any flags work just fine but as soon as I try to dissamble the file with Pydisasm -F xasm ./file.pyc it fails with the following traceback:

Traceback (most recent call last):
  File "/usr/local/bin/pydisasm", line 33, in <module>
    sys.exit(load_entry_point('xdis', 'console_scripts', 'pydisasm')())
  File "/usr/lib/python3/dist-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/root/python-xdis/xdis/bin/pydisasm.py", line 72, in main
    disassemble_file(path, sys.stdout, format)
  File "/root/python-xdis/xdis/disasm.py", line 329, in disassemble_file
    disco(
  File "/root/python-xdis/xdis/disasm.py", line 160, in disco
    disco_loop_asm_format(opc, version_tuple, co, real_out, {}, set([]))
  File "/root/python-xdis/xdis/disasm.py", line 220, in disco_loop_asm_format
    disco_loop_asm_format(
  File "/root/python-xdis/xdis/disasm.py", line 220, in disco_loop_asm_format
    disco_loop_asm_format(
  File "/root/python-xdis/xdis/disasm.py", line 249, in disco_loop_asm_format
    assert mapped_name not in fn_name_map
AssertionError

I also printed out the vars from the assert:

mapped_name='listcomp_0x7f3d301932f0'

fn_name_map={'listcomp_0x7f3d30192ff0': 'listcomp', 'listcomp_0x7f3d301932f0': 'listcomp'}
rocky commented 2 years ago

In order for me to work on, I'd need a complete short example with the pyc you started out with, the disassembly of that, the change to the assembly, and finally the resulting pyc. The shortest example that shows this is desirable.

Vaipex commented 2 years ago

I never got to the point of successfully disassembling the .pyc so its not newly assembled but here is one of the failing files.

test.zip

rocky commented 2 years ago

Ah - I see what's up. If I or someone else doesn't answer this in a week or so, remind me.

Vaipex commented 2 years ago

alright, thank you!

Vaipex commented 2 years ago

any updates? @rocky

rocky commented 2 years ago

Here is my understanding of the situation.

Some background first.

For each list comprehension that appears in Python code, a code object is created for the "body" of the code. For example if you write:

[x + 1 for x in collection]

Parts of the disassembly will look like:

# Source code size mod 2**32: 26 bytes
# Method Name:       <module>
...
# Stack size:        2
# Flags:             0x00000040 (NOFREE)
# First Line:        1
# Constants:
#    0: <code object <listcomp> at 0x7fe0b8beb9f0, file "lc.py", line 1>
#    1: '<listcomp>'
#    2: None
# Names:
#    0: __file__
  1:           0 LOAD_CONST           (<code object <listcomp> at 0x7fe0b8beb9f0, file "lc.py", line 1>)
...
# Method Name:       <listcomp>
...
  1:           0 BUILD_LIST           0
               2 LOAD_FAST            (.0)

The function or method named <listcomp> is created for the part of the source code x + 1

If there is another list comprehension , another code object with the same method named <listcomp> is created.

The way the disassembler disambiguates the different <listcomp> methods is to append the hex address, e.g.0x7fe0b8beb9f0 to the end of the name.

Apparently there are two listcomp methods with the same name including the hex address.

I understand how that is possible, but apparently it is.

I believe a simple workaround is to run the disassembler with a Python interpreter that matches the bytecode inside the bytecode.

When that is done, instead of xdis' structure for a code object, the "native" structure of the code object is used, I think no name mapping is needed.

I could be wrong here though.