python / cpython

The Python programming language
https://www.python.org
Other
63.32k stars 30.31k forks source link

Incorrect evaluation order of function arguments with *args #67505

Open 3620556a-52eb-410d-958f-a012cdc382e8 opened 9 years ago

3620556a-52eb-410d-958f-a012cdc382e8 commented 9 years ago
BPO 23316
Nosy @gvanrossum, @terryjreedy, @bitdancer, @serhiy-storchaka, @NeilGirdhar, @gvanrossum, @iritkatriel

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-bug', '3.11'] title = 'Incorrect evaluation order of function arguments with *args' updated_at = user = 'https://bugs.python.org/JoshuaLandau' ``` bugs.python.org fields: ```python activity = actor = 'iritkatriel' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'Joshua.Landau' dependencies = [] files = [] hgrepos = [] issue_num = 23316 keywords = [] message_count = 25.0 messages = ['234672', '234673', '234675', '234697', '234698', '234702', '234703', '234704', '234756', '234773', '235083', '395929', '395951', '395953', '395962', '395965', '395968', '395970', '395972', '395974', '395975', '395978', '396006', '396009', '396369'] nosy_count = 9.0 nosy_names = ['gvanrossum', 'terry.reedy', 'r.david.murray', 'SilentGhost', 'Joshua.Landau', 'serhiy.storchaka', 'NeilGirdhar', 'Guido.van.Rossum', 'iritkatriel'] pr_nums = [] priority = 'low' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue23316' versions = ['Python 3.11'] ```

3620556a-52eb-410d-958f-a012cdc382e8 commented 9 years ago

It is claimed that all expressions are evaluated left-to-right, including in functions¹. However,

f(*a(), b=b())

will evaluate b() before a().

¹ https://docs.python.org/3/reference/expressions.html#evaluation-order

8977102d-e623-4805-b309-cd0ad35b0d72 commented 9 years ago

It seems, if I read https://docs.python.org/3/reference/expressions.html#calls correctly that the evaluation order of the function arguments is not defined in general, as it depends on your use of keyword argument and exact function signature. Naturally, f(a(), b()) would be evaluated in the order you're expecting.

bitdancer commented 9 years ago

The resolution of bpo-16967 argues that this should probably be considered a bug. It certainly goes against normal Python expectations. I think it should also be considered to be of low priority, though.

2c693aaf-b711-4c74-9243-f6794805e1ba commented 9 years ago

I assume this is the problem:

>>> dis.dis('f(*a(), b=b())')
  1           0 LOAD_NAME                0 (f)
              3 LOAD_NAME                1 (a)
              6 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
              9 LOAD_CONST               0 ('b')
             12 LOAD_NAME                2 (b)
             15 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
             18 CALL_FUNCTION_VAR      256 (0 positional, 1 keyword pair)
             21 RETURN_VALUE

— looks fine.

>>> dis.dis('f(b=b(), *a())')
  1           0 LOAD_NAME                0 (f)
              3 LOAD_NAME                1 (a)
              6 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
              9 LOAD_CONST               0 ('b')
             12 LOAD_NAME                2 (b)
             15 CALL_FUNCTION            0 (0 positional, 0 keyword pair)
             18 CALL_FUNCTION_VAR      256 (0 positional, 1 keyword pair)
             21 RETURN_VALUE

Joshua, we could make function calls take:

x lists y dictionaries one optional list z dictionaries

but we as well do all the merging in advance:

one optional list one optional dictionary one optional list one optional dictionary

which is representable in three bits, but four is easier to decode I think.

2c693aaf-b711-4c74-9243-f6794805e1ba commented 9 years ago

actually, we accept alternation, so maybe a bit to say whether we start with a list or a dict followed by a length of the alternating sequence?

bitdancer commented 9 years ago

Neil: I presume you are speaking of your in-progress PEP patch, and not the current python code? If so, please keep the discussion of handling this in the context of the PEP to the PEP issue. This issue should be for resolving the bug in the current code (if we choose to do so...if the PEP gets accepted for 3.5 this issue may become irrelevant, as I'm not sure we'd want to fix it in 3.4 for backward compatibility reasons).

2c693aaf-b711-4c74-9243-f6794805e1ba commented 9 years ago

Yes, sorry David. I got linked here from the other issue. In any case, in the current code, the longest alternating sequence possible is 4. So one way to solve this is to change the CALL_FUNCTION parameters to be lists and dicts only and then process the final merging in CALL_FUNCTION.

2c693aaf-b711-4c74-9243-f6794805e1ba commented 9 years ago

another option is to add a LIST_EXTEND(stack_difference) opcode that would allow us to take the late iterable and extend a list at some arbitrary stack position. I had to add something like that for dicts for the other issue, so it would follow that pattern.

2c693aaf-b711-4c74-9243-f6794805e1ba commented 9 years ago

After thinking about this a bit more, my suggestion is not to fix it. Instead, I suggest that PEP-8 be modified to suggest that all positional arguments and iterable argument unpackings precede keyword arguments and keyword argument unpackings. Then, a tool like autopep8 is free to reorganize argument lists. Such reorganization will not be possible if f(a(), b=b()) is different than f(b=b(), \a()).

2c693aaf-b711-4c74-9243-f6794805e1ba commented 9 years ago

(I also suggest that the evaluation order within a function argument list to be defined to be positional and iterable before keyword, otherwise left-to-right — rather than strictly left-to-right).

terryjreedy commented 9 years ago

I expect left to right as documented (and designed by Guido). His OK or clarification would be to intentionally do differently.

iritkatriel commented 3 years ago

I don't think this is true anymore:

>>> def a(): 
...   print('a')
...   return (1,2)
... 
>>> def b():
...   print('b')
...   return (3,4)
... 
>>> def f(*args, b=None):
...   print('f')
... 
>>> f(*a(), b=b())
a
b
f
>>>
terryjreedy commented 3 years ago

The particular example of left-to-right function evaluation in https://docs.python.org/3/reference/expressions.html#evaluation-order is "expr1(expr2, expr3, *expr4, **expr5)".

Joshua's claim, without evidence, that "'f(*a(), b=b())' will evaluate b() before a()" was false, as shown by Neil with dis and is now, as shown by Irit with a code example.

But Neil also showed, again with dis, that a revised discrepancy claim, ""'f(b=b(), *a())' will evaluate a() before b()", is true, because reversing the order of these particular arguments does not reverse the evaluation order. The dis today is equivalent to the one 7 years ago, with CALL_FUNCTION_VAR expanded to two operations.

dis.dis('f(b=b(), *a())') 1 0 LOAD_NAME 0 (f) 2 LOAD_NAME 1 (a) 4 CALL_FUNCTION 0 6 LOAD_CONST 0 ('b') 8 LOAD_NAME 2 (b) 10 CALL_FUNCTION 0 12 BUILD_MAP 1 14 CALL_FUNCTION_EX 1 16 RETURN_VALUE

Irit's example, carried one step further confirms this.

>>> f(b=b(), *a())
a
b
f

Although I considered SilentGhost's reading of https://docs.python.org/3/reference/expressions.html#calls as too 'loose' (evaluation order of function argument *is* defined in general), that section address this exact case.

"A consequence of this is that although the *expression syntax may appear after explicit keyword arguments, it is processed before the keyword arguments ..."

I read this as an acknowledgement that this order violates the general rule.

I do wonder whether exception is specific to using a stack machine and implementation convenience or whether *all* implementations must follow this order. If C-Python specific, it should be labelled as such.

The doc continues with an example call, for a different 'f', where, unlike Irit's 'f', the result is "TypeError: f() got multiple values for keyword argument 'a'". It concludes "It is unusual for both keyword arguments and the *expression syntax to be used in the same call, so in practice this confusion does not arise."

The problem with mixing unpacking and named values might be even clearer if "f(b=1, *(2,1))" were added to the example box. Perhaps *exp after x=exp should be prohibited.

If the exception is not removed from the implementation, then perhaps it should be added to the evaluation order section. Something like

"The one exception is in function calls with *expression after a keyword argument: f(x=expr2, *expr1)."

gvanrossum commented 3 years ago

"The one exception is in function calls with *expression after a keyword argument: f(x=expr2, *expr1)."

So the question before us is whether to add this phrase to the docs, or to tweak the compiler to fix it. It is certainly convenient to update the docs rather than expend the resources to change the compiler for what looks like a rare corner case.

How certain are we that this is truly the only exception? Can someone read the compiler source code related to argument order, or experiment with all the different permutations of positional args, keyword args, *args, and **kwargs? (Fortunately the evaluation order is independent from the function definition, so it's really just all permutations of those four things.)

terryjreedy commented 3 years ago

The following does not consider keyword-only args with or without defaults. My understanding is the the compiler does not know or considered the signature of the function when evaluating the args.

from dis import dis
from itertools import permutations as pm  # My first use of this.

for a1, a2, a3, a4 in pm(('a', 'b=b', '*g', '**k')):
    s = f"e({a1}, {a2}, {a3}, {a4})"
    print(s)
    try:
        cd = compile(s, '', 'eval')
    except SyntaxError as e:
        print(e)
        continue
    dis(cd)

Since positional arg cannot follow keyword arg or keyword arg unpacking and positional arg unpacking cannot follow keyword arg unpacking, only 5 orders of 'a', 'b=b', '*g', and '**k' are legal.

e(a, b=b, *g, **k) g before b e(a, *g, b=b, **k) normal e(a, *g, **k, b=b) normal e(g, a, b=b, **k) normal e(g, a, **k, b=b) normal

For whatever reason, the byte code is more complicated without function arguments.

e(a, b=b, *g, **k)
  1           0 LOAD_NAME                0 (e)
              2 LOAD_NAME                1 (a)
              4 BUILD_LIST               1
              6 LOAD_NAME                2 (g)
              8 LIST_EXTEND              1
             10 LIST_TO_TUPLE
             12 LOAD_CONST               0 ('b')
             14 LOAD_NAME                3 (b)
             16 BUILD_MAP                1
             18 LOAD_NAME                4 (k)
             20 DICT_MERGE               1
             22 CALL_FUNCTION_EX         1
             24 RETURN_VALUE

The exceptional case essentially says that positional arg unpacking cannot follow keyword args. This is consistent with the other 3 prohibitions. Arguments passed by position form one group, those passed by name another. The nice, simple, and consistent rule is that positional args (and unpacking thereof) cannot follow keyword args (and unpacking there).

But in this one case, instead of raising SyntaxError like the other 3 prohibited orders, the compiler in effect rewrites* the code in a legal order -- and then compiles as if written that way. How often does it rewrite code to make it correct?

I think it would have been better to have stuck with 'positional before keyword'. Can we consider deprecating something rare and easy to get wrong?

Thinking more, I might prefer leaving the Evaluation Order section alone, as it is correct with respect to the internal rewrite, and revise the explanation of the quirk in the function doc. We could deprecate the mis-ordering at least in the doc.

serhiy-storchaka commented 3 years ago

See also thread "Order of positional and keyword arguments" on the Python-Dev mailing list (https://mail.python.org/archives/list/python-dev@python.org/thread/I6AUYV6DVEMP7XVYVDWC62N6PK6FHX3H/).

gvanrossum commented 3 years ago

Well, it seems we're stuck -- we can't make the grammar more restrictive (at least, I don't think we should, since it is a backwards incompatibility), so we'll have to live with the exception. Since it is now clear what the rule is, we can update the docs.

2c693aaf-b711-4c74-9243-f6794805e1ba commented 3 years ago

FYI: https://github.com/PyCQA/pylint/issues/4586

serhiy-storchaka commented 3 years ago

Would not be be better in long term to get rid of irregularities? It would make the grammar simpler and the documentation clearer.

The only use case of such syntax in wrappers, but they always can be rewritten in other style, with natural order of arguments evaluation.

    def wrapper(*args, **kw):
        return wrapped_fn(*args, some_arg=1, **kw)
iritkatriel commented 3 years ago

It should be possible to make the compiler emit code that evaluates the arg values left to right in all cases.

gvanrossum commented 3 years ago

Looks like Terry accidentally removed Serhiy. Adding him back.

terryjreedy commented 3 years ago

(The nosy list change was an accident of my local copy not being complete refreshed before posting.)

If (b=b, *c) were evaluated in order, then the byte code for b=b and any subsequent keyword arguments would have to be put aside, such as in a separate buffer, until it was known that there would be no following *exp. Without lookahead, this is known when either **kw or closing ) is reached. At that point, the keyword buffer would be copied to the main buffer.

It might actually be easier to remove all order restrictions and compile all keyword values to a side buffer, to be copied to the main buffer when the closing ) is reached.

One version of the question I am raising is this: given that f(a, b=b) and f((a,), b=b) have the same effect (are alternate spellings of the same instruction(s)), why should f(b=b, a) and f(b=b, *(a,)) *not have the same effect, with one spelling being prohibited and the other not?

The meaning of '*expression' is defined as having the same effect as an equivalent sequence of positional argument expression in the same place as the expression.

"If the syntax *expression appears in the function call, expression must evaluate to an iterable. Elements from these iterables are treated as if they were additional positional arguments. For the call f(x1, x2, *y, x3, x4), if y evaluates to a sequence y1, …, yM, this is equivalent to a call with M+4 positional arguments x1, x2, y1, …, yM, x3, x4."

The first sentence of the next paragrapsh, allowing only *exp to follow b=exp, contradicts the last line above. I am sorry I did not read these paregraphs carefully until now.

gvanrossum commented 3 years ago

I'm currently not very interested in philosophizing about why we allow one syntax but not the other.

terryjreedy commented 3 years ago

I was pointing out what to me is a second related contradiction in the doc, between one sentence and the next.

iritkatriel commented 3 years ago

See also bpo-33492.