rocky / python-uncompyle6

A cross-version Python bytecode decompiler
GNU General Public License v3.0
3.82k stars 410 forks source link

FYI: from 3.8.0 to 3.9.0, `pos_arg` wrapper around positional arguments was removed #428

Closed jpivarski closed 1 year ago

jpivarski commented 1 year ago

This is neither a bug nor a feature request, but you might not be aware that function calls in uncompyle6 3.8.0 like this:

call (7)
     0. expr
         L.  39         0  LOAD_GLOBAL              xy_xy
     1. pos_arg
        expr
                            2  LOAD_FAST                'lib'
     2. pos_arg
        expr
                            4  LOAD_FAST                'x1'
     3. pos_arg
        expr
                            6  LOAD_FAST                'y1'
     4. pos_arg
        expr
            call (5)
                 0. expr
                    attribute37 (2)
                         0. expr
                                            8  LOAD_GLOBAL              x
                         1.                10  LOAD_METHOD              rhophi
                 1. pos_arg
                    expr
                                       12  LOAD_FAST                'lib'
                 2. pos_arg
                    expr
                                       14  LOAD_FAST                'rho2'
                 3. pos_arg
                    expr
                                       16  LOAD_FAST                'phi2'
                 4.                18  CALL_METHOD_3         3  ''
     5. pos_arg
        expr
            call (5)
                 0. expr
                    attribute37 (2)
                         0. expr
                                           20  LOAD_GLOBAL              y
                         1.                22  LOAD_METHOD              rhophi
                 1. pos_arg
                    expr
                                       24  LOAD_FAST                'lib'
                 2. pos_arg
                    expr
                                       26  LOAD_FAST                'rho2'
                 3. pos_arg
                    expr
                                       28  LOAD_FAST                'phi2'
                 4.                30  CALL_METHOD_3         3  ''
     6.                32  CALL_FUNCTION_5       5  ''

have become this in uncompyle6 3.9.0:

call (7)
     0. expr
         L.  39         0  LOAD_GLOBAL              xy_xy
     1. pos_arg
        expr
                            2  LOAD_FAST                'lib'
     2. pos_arg
        expr
                            4  LOAD_FAST                'x1'
     3. pos_arg
        expr
                            6  LOAD_FAST                'y1'
     4. pos_arg
        expr
            call (5)
                 0. expr
                    attribute37 (2)
                         0. expr
                                            8  LOAD_GLOBAL              x
                         1.                10  LOAD_METHOD              rhophi
                 1. expr
                                   12  LOAD_FAST                'lib'
                 2. expr
                                   14  LOAD_FAST                'rho2'
                 3. expr
                                   16  LOAD_FAST                'phi2'
                 4.                18  CALL_METHOD_3         3  ''
     5. pos_arg
        expr
            call (5)
                 0. expr
                    attribute37 (2)
                         0. expr
                                           20  LOAD_GLOBAL              y
                         1.                22  LOAD_METHOD              rhophi
                 1. expr
                                   24  LOAD_FAST                'lib'
                 2. expr
                                   26  LOAD_FAST                'rho2'
                 3. expr
                                   28  LOAD_FAST                'phi2'
                 4.                30  CALL_METHOD_3         3  ''
     6.                32  CALL_FUNCTION_5       5  ''

(The expr is no longer wrapped by a single-argument pos_arg.)

It's likely because of this change in https://github.com/rocky/python-uncompyle6/commit/2ed211e0d4a5b53d3f373ca05afd4655bf48b192:

- # number of apply equiv arguments:
- nak = (len(opname_base) - len("CALL_METHOD")) // 3
- rule = (
-     "call ::= expr "
-     + ("pos_arg " * args_pos)
-     + ("kwarg " * args_kw)
-     + "expr " * nak
-     + opname
- )
+ if opname == "CALL_METHOD_KW":
+     args_kw = token.attr
+     rules_str = """
+          expr ::= call_kw_pypy37
+          pypy_kw_keys ::= LOAD_CONST
+     """
+     self.add_unique_doc_rules(rules_str, customize)
+     rule = (
+         "call_kw_pypy37 ::= expr "
+         + ("expr " * args_kw)
+         + " pypy_kw_keys "
+         + opname
+     )
+ else:
+     args_pos, args_kw = self.get_pos_kw(token)
+     # number of apply equiv arguments:
+     nak = (len(opname_base) - len("CALL_METHOD")) // 3
+     rule = (
+         "call ::= expr "
+         + ("expr " * args_pos)
+         + ("kwarg " * args_kw)
+         + "expr " * nak
+         + opname
+     )

From the commit message and the fact that the change adds an if clause but mostly keeps the else clause like the old code (with the exception of writing "expr " instead of "pos_arg "), I don't think this was an intentional change. It might not have been noticed because the pos_arg tree element was only ever wrapping expr as a single argument, so decompiled code would look the same.

We noticed this because we use uncompyle6 to ensure that a suite of functions in our codebase use a restricted set of features, such as preventing keyword arguments. Here's where we updated our test to be insensitive to the differences between uncompyle6 3.8.0 and 3.9.0: https://github.com/scikit-hep/vector/pull/305#issuecomment-1387451286.

jpivarski commented 1 year ago

Well, maybe it is a bug because some calls have the pos_arg wrapper and others don't. I could see how that would be undesirable. I'll label this as a bug (if I can; I'll find out in a moment), and if you disagree, you can unlabel it, close this issue, etc.

jpivarski commented 1 year ago

Nope, I can't label it.

rocky commented 1 year ago

> Nope, I can't label it.

Issue labels now have been added although it is more of a quirkiness than a bug.

I think I see what is wrong - the non-positional argument when the call has keyword arguments is actually at the other end; the expression, or more precisely, the list expression is added to collect the values attached to the keyword arguments.

Actually, this is correct: the first "expr" is the name of the function. It is not a positional argument.

rocky commented 1 year ago

Also, when reporting things like this please follow more closely the information requested in https://github.com/rocky/python-uncompyle6/blob/master/.github/ISSUE_TEMPLATE/bug-report.md

I suspect if that had been done here, the situation would have been more easily identified.

jpivarski commented 1 year ago

Thanks!

At first, I didn't think it was a bug report, so I didn't follow that template. Then I changed my mind.

rocky commented 1 year ago

Whether it is a bug report or some other kind of report, there are principles in the bug report template to follow. Like not just give output but give input. (This is kind of a basic reporting kind of thing, but it also appears in the bug-report template.)

rocky commented 1 year ago

@jpivarski First, let me apologize. I misread and misunderstood which part of the code you were pointing to. You described the section to look at correctly, but I looked at the wrong part in my usual sleep-deprived, and time-limited state.

When I reread again, I see that yes, this was a simple change (I can't really call it a "typo" since it isn't wrong) of using "expr" when for reasons that I didn't understand were used by someone or some project.

The parse tree in the current master branch is now the way it was in 3.8.0.

But let me come back again to the items mentioned in the template. Reducing this a smaller and simpler example would have helped. And there is a section called "Context" which is where you'd link to that other place. So these things help me. Again, my apologies.

jpivarski commented 1 year ago

It's okay, it's your project and I didn't read the templates.

In our Scikit-HEP discussions (https://gitter.im/HSF/PyHEP-histogramming), we've been talking about whether incoming users can be expected to click on the "no template" link as an alternative to the templates—it's unclear whether it can be considered a part of the normal workflow. In those discussions, I did learn that it's possible to remove the "no template" link, which you could do if you want to ensure that users see the templates.

Actually, you solved it more quickly than I had expected.


Oh! Independent of this issue, I should mention the following since I have your attention: I know it's hard to keep up with the changing bytecode of recent Python versions. Recent Python versions are (unintentionally) obfuscating the bytecode in attempts to make the interpreter faster, but that also makes it harder to reverse engineer, to build a parse tree in your case or to convert to LLVM by the Numba library.

In the Numba developers group, I've brought up the idea of petitioning core Python to add a feature (a PEP) for decorators to be able to prevent functions from fully optimizing their bytecode—certain kinds of decorators would be able to ask for functions to have easily interpretable bytecode, at the expense of speed. It would be an opt-in feature that wouldn't affect anybody who doesn't know about it.

I've said that Numba and uncompyle6 are two use-cases that could significantly benefit from this. The Numba group thought it was interesting, but never went very far with it. What do you think? Would it be helpful even if it came into effect in some later version of Python? (For the record, I have no idea what a PEP timescale would be like.)

Also, I could move this to a Discussion or other forum if you prefer not having it on this issue.

rocky commented 1 year ago

I confess there is much that I don't know in this area. (It is very true that keeping up to date with current Python bytecode is near impossible for this done as a one of many side projects. Another side project Mathics3, I hope will be able to make use of the work in scikit-hep/vector).

Just the other day, I came across https://mail.python.org/archives/list/python-dev@python.org/thread/XZ7KDCI3TXEUERU3YIFKC543GAGIYG6Q/ which seems to indicate, if I have this correct, that Python 3.11 does have Pseudo-ops at some stage in its compilation, but strips this out later. And I think the "bytecode" author may be thinking along the same lines as you are.

rocky commented 1 year ago

In our Scikit-HEP discussions ... I did learn that it's possible to remove the "no template" link, which you could do if you want to ensure that users see the templates.

Thanks! Added now!