python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.41k forks source link

Bind greedily assigns kwargs to args, which can cause bugs when binding to decorated functions #102551

Open Samreay opened 1 year ago

Samreay commented 1 year ago

Bug report

Several repositories (like Prefect) make use of deferred execution of functions. They utilise inspect.Signature to create a bound method, and turn a list of parameters into args and kwargs to be passed in *args, **kwargs.

When it works fine, it looks like this:

import inspect

def add(a, b, c=100):
    return a + b + c

bound = inspect.signature(add).bind(1, 2, c=100)
print(f"args: {bound.args}, kwargs: {bound.kwargs}, result: {add(*bound.args, **bound.kwargs)}")

And this prints out:

args: (1, 2, 100), kwargs: {}, result: 103

Notice that the 100 has moved from a kwarg into an arg, but that's fine, it will still run.

Things get complicated when decorators are introduced, and it seems base python has no method of getting args and kwargs that will work with both the wrapped and unwrapped signature.

The below code has two methods, each decorated, and both of them fail the bound execution.

from functools import wraps
import inspect

def decorator(fn):
    @wraps(fn)
    def wrapper(a, b, **kwargs):
        print(f"kwargs are {kwargs}")
        return fn(a, b, **kwargs)

    return wrapper

@decorator
def add(a, b, c=100):
    return a + b + c

def decorator2(fn):
    @wraps(fn)
    def wrapper(a, b, c=100):
        return fn(a, b, c=c)

    return wrapper

@decorator2
def add2(a, b, **kwargs):
    return a + b + kwargs.get("c", 100)

try:
    bound_signature = inspect.signature(add).bind(1, 2, c=100)
    print(bound_signature.args, bound_signature.kwargs)
    add(*bound_signature.args, **bound_signature.kwargs)
except Exception as e:
    print("Failure in first method: ", e)

try:
    bound_signature = inspect.signature(add2, follow_wrapped=False).bind(1, 2, c=100)
    print(bound_signature.args, bound_signature.kwargs)
    add(*bound_signature.args, **bound_signature.kwargs)
except Exception as e:
    print("Failure in second method: ", e)

And fails with:

(1, 2, 100) {}
Failure in first method:  wrapper() takes 2 positional arguments but 3 were given
(1, 2, 100) {}
Failure in second method:  wrapper() takes 2 positional arguments but 3 were given

They fail because of that default behaviour of treating kwargs as args.

In terms of expected behaviour, I would have assumed that a signature on a wrapped method, when resolving to args and kwargs, would be able to assign parameters between args and kwargs reliably.

Potential Solution

I notice that in inspect.py -> BoundArguments -> args/kwargs properties that some logic might be modified here.

Instead of making all _POSITIONAL_OR_KEYWORD parameters args, I feel the language would be more correct to assign them based upon whether or not the parameter.default is Parameter.empty. This may fix the issue and stop greedy conversion to arguments.

Your environment

Linked PRs

gaogaotiantian commented 1 year ago

I made a PR for this issue and listed the basic ideas in #102820.

I've confirmed that this will fix the examples given above.

The PR is still a prototype, and I probably need a review for whether that's the correct direction. If we decided to do that, we need to fix the unittest library or the unittest tests - basically, for unittest.mock.assert_called_with, does (1, 2) counts the same as (a=1, b=2) (personally I disagree on it, but we can go either way).

hauntsaninja commented 1 year ago

Thanks, but I'm not sure this is a correct change to make to BoundArguments.

BoundArguments is really just meant to be a mapping from parameter values to argument values. It's not meant to remember how things were passed into bind. This is quite clear from the documentation. The fact that changing this e.g. breaks unittest means this is an important part of how the API is used (and therefore subject to backward compatibility guarantees, even if we did think we should change what this API was meant to do). If you want to remember how things were passed, I'd just pass args and kwargs around.

Note that the issue in this PR is only a concern in the case that you're using wraps in a decorator that doesn't actually preserve the callable signature, like in your example where your decorator means that c can no longer be passed via position. Consider instead setting the __signature__ attribute? For example, see this PR I made to pandas: https://github.com/pandas-dev/pandas/pull/48693

Overall, I'm not sure there should be a change made here. Maybe the docs could be a little more explicit about the fact that args and kwargs being "dynamically computed from arguments" implies they don't remember how they were bound.

Samreay commented 1 year ago

Consider instead setting the __signature__ attribute?

Yeah, there are workarounds, but not ones I can implement (ie this is hitting me from another library and I don't have control over the code).

Backwards compatibility is a right pain for sure, but this all comes down in my mind to what is the intended or best behaviour of BoundArguments. The decorator example just shows how the unintitive behaviour of the method can cause bugs in code that should run perfectly fine, but we can dispense with it to simplify the topic.

From the doco it stays that bind "Create a mapping from positional and keyword arguments to parameters."

Given that description, the fact that the below two lines don't preserve the information passed in seems like an obvious flaw in the method. I explicitly bind a kwarg, and the method forcefully turns it into an arg and there is nothing I can do about it.

bound_signature = inspect.signature(add).bind(1, 2, c=100)
print(bound_signature.args, bound_signature.kwargs)
# (1, 2, 100) {}

Perhaps we should agree on whether or not this is ideal or expected behaviour from the method, and can then discuss fixes? From what I see theres the proposal @gaogaotiantian has submitted. Another implementation might be to check the default value of the Parameter in the get args and kwargs methods to correctly put kwargs into kwargs. A third implementation might be to add a separate method or arguments to args/kwargs to keep backwards compatibiltiy and allow the user the choice of "should kwargs be greedily assigned to args or not".

gaogaotiantian commented 1 year ago

I agree that the first thing is to figure out - what is the proper behavior of BoundArguments.

Let's forget about the unittest failure, and focus on the class itself. I can understand if the BoundArguments is simply a mapping for all the arguments, but in that case, args and kwargs are basically meaningless. What's the point to give args and kwargs, if it's simply a mapping that holds the values passed in to each argument? It's just wrong. Well it's often correct with the heuristic, but fundamentally it's wrong. "Calculate args and kwargs from the arguments" does not make sense. (However, I agree that in this case, BoundArguments itself has a clear definition and it does make sense)

Checking default value won't help this situation because BoundArguments now does not have enough information to deduct what's the kwargs. In this specific case, the current implementation of BoundArguments will have exactly the same data if you do bind(1, 2, 100) and bind(1, 2, c=100). If you want to distinguish the two cases, you need extra data.

Now let's circle back to the unittest, the documentation actually never uses this "feature/bug".

>>> mock = Mock()
>>> mock.method(1, 2, 3, test='wow')
<Mock name='mock.method()' id='...'>
>>> mock.method.assert_called_with(1, 2, 3, test='wow')

It does not check for (1, 2, 3, 'wow'), it specifically checked against (1, 2, 3, test='wow')(same thing in the examples below), which I think is how people normally think.

# To me, this assertion pass would actually makes me feel confused
>>> mock.method.assert_called_with(1, 2, 3, 'wow')
Samreay commented 1 year ago

What's the point to give args and kwargs, if it's simply a mapping that holds the values passed in to each argument?

+1. If the functionality to extra args and kwargs is in the class, then its part of how the class is going to be used.

If you want to distinguish the two cases, you need extra data.

Ah right, I see what you mean. Checking default using the signature parameters means that any deliberate "Im doing this as an arg!" would now greedily become a kwarg. So same problem as before, just reversed.

Completely agree that the second assert which would pass would not be expected

gaogaotiantian commented 1 year ago

Went back on this issue today and realized in PEP 362 it's clearly stated

Arguments which could be passed as part of either *args or **kwargs will be included only in the BoundArguments.args attribute.

An example was given as well:

def test(a=1, b=2, c=3):
    pass

sig = signature(test)
ba = sig.bind(a=10, c=13)

>>> ba.args
(10,)

>>> ba.kwargs:
{'c': 13}

Like it or not, this is the designed behavior thus not a bug. We should probably document it more clearly though. I'll close the PR.

Samreay commented 1 year ago

Yeah, it's hard to argue with the PEP even if I don't know why that decision was made. Ah well.