rocky / python-uncompyle6

A cross-version Python bytecode decompiler
GNU General Public License v3.0
3.74k stars 408 forks source link

whilestmt38 contains a "_stmts" instead of one of "l_stmts", "l_stmts_opt", "pass" #498

Closed gdesmar closed 2 months ago

gdesmar commented 2 months ago

Description

Pysource's template_engine function triggers an assertion error because it found a _stmts instead of one of l_stmts, l_stmts_opt, pass.

How to Reproduce

The example can be generated using python 3.8 and compiling the following, with python -m compileall brokenwhile.py

import time

r = 0
while r == 1:
    print(time.time())
    if r == 1:
        r = 0

The assertion on pysource's template_engine() fails with the follwing message:

AssertionError: at whilestmt38[2], expected to be in '('l_stmts', 'l_stmts_opt', 'pass')' node; got '_stmts'

Output Given

  [...]
  File ".../uncompyle6/semantics/pysource.py", line 436, in preorder
    super(SourceWalker, self).preorder(node)
  File ".../spark_parser/ast.py", line 112, in preorder
    self.default(node)
  File ".../uncompyle6/semantics/pysource.py", line 896, in default
    self.template_engine(table[key.kind], node)
  File ".../uncompyle6/semantics/pysource.py", line 764, in template_engine
    node[index[0]] in index[1]
AssertionError: at whilestmt38[2], expected to be in '('l_stmts', 'l_stmts_opt', 'pass')' node; got '_stmts'

Expected behavior

See section workarounds: with the proposed workaround, the original code is successfully recovered.

Environment

Uncompyle console output headers:

uncompyle6 version 3.9.1 Python bytecode version base 3.8.0 (3413) Decompiled from: Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0]

pydisasm, version 6.1.0

Python version used to compile the bytecode: Python 3.8.19

Executing on Debian 11 (Bullseye)

Workarounds

The cleanest workaround I am seeing is to modify the table for whilestmt38 in customize_for_version38. (I would personally prefer to have the _stmts as the first item, if I may, but this is simply to show the modification.)

            "whilestmt38": (
                "%|while %c:\n%+%c%-\n\n",
                (1, ("bool_op", "testexpr", "testexprc")),
-               (2, ("l_stmts", "l_stmts_opt", "pass")),
+               (2, ("l_stmts", "l_stmts_opt", "pass", "_stmts")),
            ),

Additional Context

I have a feeling this may be too naive of an approach. There may be deeper issues with allowing _stmts directly into the whilestmt38 codeblock. I may think that it would be part of l_stmts, or that the parser/ast need a tweak to understand that it is a l_stmts instead of a _stmts.

If the workaround is the solution, I am more than happy to open a Pull Request for it. If not, and you have a hint where to look into to have a better understanding of the problem, I am interested to help, if needed.

rocky commented 2 months ago

Thanks for the detailed report, and proposed fix (which is correct).

I have a feeling this may be too naive of an approach.

Happily, the simple approach I think is correct. It will be fixed soon.

There may be deeper issues with allowing _stmts directly into the whilestmt38 codeblock. I may think that it would be part of l_stmts, or that the parser/ast need a tweak to understand that it is a l_stmts instead of a _stmts.

Let me explain the difference between stmts, _stmts and l_stmts; stmts is one or more statements (stmt) while _stmts is zero or more statements. (stmts_opt might be a clearer name). And l_stmts was supposed to be stmts augmented with the kinds of statements that can only be found in loops, such as break and continue.

Right now, there is a bit of looseness here in the gramma. The distinction between the various kinds of "stmt" was done a while ago. I am currently in the process of redoing the entire grammar where I hope to tighten things up.

In the way old Python 2.3 code which made its way to Python 2.7 before I picked this up, there were not the reduction checks. That is, instead of:

... (2, ("l_stmts", "l_stmts_opt", "pass", "_stmts")), ... 

what written was:

... 2, ...

Over time I have been adding in those additional checks. I sometimes get things wrong though. That's why I work on debuggers.