python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.4k forks source link

Accidental name bindings with structural pattern matching #93059

Open JohnLockwood opened 2 years ago

JohnLockwood commented 2 years ago

Bug report

Your environment

Steps to reproduce:

Run the code in bug.py, attached.

bug.zip

The output on my Mac mini is:

% python bug.py
Cheese Sandwich

Search for 'match' in the code. Notice that one of the terms matches the only str in the list, "Cheese Sandwich". Later we create a dataclass of type PersonHeight. On the last line of the file, we print the type of one of the fields in a PersonHeight object. Note that it is not <class 'str'> as it should be, but "Cheese Sandwich"

If we comment out the match expression, we get the correct behavior (see comment below in code)

JelleZijlstra commented 2 years ago

The code in the sample boils down to this:

match "Cheese Sandwich":
    case str as s: pass

print(str)

The case str as s: pattern binds the "Cheese Sandwich" object to both the name str and the name s. You likely wanted to write case str() as s: instead, which matches to instances of the class str.

This is behaving as expected, but the behavior is indeed surprising. Perhaps we could give a SyntaxWarning for an as-pattern where the LHS is a name-pattern, or this could be handled in linters. (cc @brandtbucher for pattern matching.)

JohnLockwood commented 2 years ago

Fair enough, the bug is in my code -- that's the far more common case :). And the code was weird example code to begin with, so it's a corner case.

That said, I agree with you that this is highly unexpected, especially being able to bind a name by using an as-pattern.

Thanks for hopping on it so quickly and for everything else you're doing, however it gets decided

stevendaprano commented 2 years ago

As pattern matching becomes more common, I expect that this gotcha is going to catch many people. It is easy to forget the brackets.

I agree that it would be good to get a SyntaxWarning in the case where you have a name-pattern as the LHS of an as-pattern. Or even a SyntaxError? Is there any use-case for case spam as ham?

Another similar trap is:

match something:
    case str:  # oops, binds to name str not matching strings
        block

and now the builtin str has been shadowed. This is much easier to do by mistake than other forms of builtin shadowing, e.g. str = value, and far less likely to be deliberate.

For example see this example where somebody accidentally shadowed the builtins int and str and then got confused. Easy to happen.

I think that it will be good to raise a SyntaxWarning if a bare name-pattern is one of the builtin names {str, int, bool, float, list, tuple, set} and not just rely on linters. That won't catch all such errors, but I expect that it will catch a good proportion of them.

It would be good to get a pre-emptive FAQ about it too.

brandtbucher commented 2 years ago

I'd be happy to receive PRs for improved docs, but I'm -0 on SyntaxWarnings here. There's just no precedent for warning against name shadowing. I also believe there are legitimate use-cases for using as to bind two or more names (case [first, *_, last] | [first as last]: ...).

For the record, irrefutable patterns that block other patterns are already compile-time errors:

>>> match "Spam":
...     case str: ...
...     case int: ...
... 
  File "<stdin>", line 2
SyntaxError: name capture 'str' makes remaining patterns unreachable
>>> match "Spam":
...     case str as s: ...
...     case int as i: ...
... 
  File "<stdin>", line 2
SyntaxError: name capture 'str' makes remaining patterns unreachable
>>> match "Spam":
...     case str | int: ...
... 
  File "<stdin>", line 2
SyntaxError: name capture 'str' makes remaining patterns unreachable

Something I am open to exploring: modifying the above message to include something like "Did you forget to include parentheses?".

stevendaprano commented 2 years ago

There's just no precedent for warning against name shadowing.

There is never a precedent for anything until the first time we do it :-)

I also believe there are legitimate use-cases for using as to bind two or more names

Fair enough. I guess it remains something for linters to warn about.

JohnLockwood commented 2 years ago

I appreciate all the discussion so far. As regards this bit of it:

I also believe there are legitimate use-cases for using as to bind two or more names

Fair enough. I guess it remains something for linters to warn about.

I have a couple of questions. First, as regards name binding, of course, that's expected with respect to the name on the right-hand side (of as). That's what "as" does -- just as the LHS of the assignment operator. I'm not sure I agree on the left-hand side (of "as", which I take to be equivalent of the RHS of assignment). For example, in using with open("file", "r") as f:, the context manager returned by with gets bound to f.

The other thing that's unexpected here is that I was trying to use "str" as the name of the type, but the name of the type never got matched in the array, only an instance of the type.

Admittedly I've only been playing with this feature recently, this issue cropped up very early in my work. So I'm willing to accept that I'm not seeing the bigger picture here, that there's a reason for passing in an instance of a type that binds the instance of the type to the name of the type when the name of the type is on the left of the as expression, but to me that's as unexpected as being able to do this:

foo = str

And then have foo bound to str. This is binding str to foo, no? Aren't LHS and RHS reversed in "as"?

I'm seeing the example @stevendaprano sent, where he commented:

"Hi Gerald,

“case str” matches everything and assigns it to the name “str”?

Yes, that’s exactly what I’m saying!

Its an easy mistake to make. I can see this becoming a gotcha for Python programmers, but new and experienced, for decades to come. "

I agree wholeheartedly with that last assessment.

qub1750ul commented 1 year ago

Hello, sorry for necrobumping. I got to learn Python recently and immediately stumbled on the named constants matching recommendation. Existing docs confused me greatly on the matter and i've got to discover this issue to clearly comprehend what was going on. The main tutorial should indeed spend a few more words clarifying interpreter behavior instead of redirecting to PEP-636, who is not really good at clearing waters given the difference in writing style.