gvanrossum / patma

Pattern Matching
1.02k stars 64 forks source link

Disallow guards when using an "open pattern"? #153

Open gvanrossum opened 4 years ago

gvanrossum commented 4 years ago

Towards the end of #140 -- which is about changing walrus patterns (v := P) to AS patterns (P as v) -- there's a side discussion about introducing guards for subpatterns, e.g.

case C(a if a >= 0, b if b >= 0):

as a more efficient way to spell

case C(a, b) if a >= 0 and b >= 0:

I don't think we should add that to the current proposal, but if we want to keep that door open for a future PEP, we should probably disallow

case a, b if a >= 0 and b >= 0:

since with sub-guards we'd want the guard to apply only to the b subpattern.

It is easy to syntactically prohibit this and force users to write

case (a, b) if a >= 0 and b >= 0:

Once the PEP is accepted and the code is merged into the 3.10 branch, we would have a much harder time to change the semantics of case a, b if ... and if we ever wanted to introduce sub-guards we'd be stuck with an ugly inconsistency.

Thoughts? We can also just decide to close the door for sub-guards (even with code colorization, the examples here look rather busy).

stereobutter commented 4 years ago

I hope the authors don't mind me posting here, if not I am happy to remove my comment.


Even if sub-guards never become a thing

case (a, b) if a >= 0 and b >= 0:

reads a bit easier to me than:

case a, b if a >= 0 and b >= 0:

because:

The same can be said for OR-Patterns (imho. either both types of patterns should require parentheses or none of them):

case (a | b) if a >= 0 and b >= 0:

vs.

case a | b if a >= 0 and b >= 0:
Tobias-Kohn commented 4 years ago

To be honest, I think guards for subpatterns are a terrible idea and not something we should ever truly consider. It is a nice example of over-engineering where the desire to add something is probably more driven by 'because we can' rather than an actual need.

Such guards in subpatterns mean that patterns are no longer 'static'. In principle, any such pattern could have a wide range of side-effects. Not only does it make it very hard to reason about such structures with possible hidden dependencies, it also basically destroys the notion of an efficient decision tree to find the correct pattern for a subject. Hence, what looks like an optimisation at first glance turns out to ruin optimisations on a much bigger scale.

That being said, the examples by Jean-Abou-Samra demonstrate indeed that there might be confusion about the exact meaning. Disallowing guards when using an "open pattern" therefore seems like a good idea just to eliminate another possible 'foot-gun'. But we really should stop there. If we just flip it around and say that case x | (x, 0) if x > 0: is unclear in its current semantics, but would be more intuitive if the guard only applied to the (x, 0) part, that seems highly illogical to me. Either it is ambiguous and a possible 'foot-gun' which we should avoid, or it can be defined one way or another and then we can stick with what we already have.

Finally, I think @SaschaSchlemmer makes a good point concerning readability and some consistency. Adding parentheses probably really is a good idea, although sub-guards clearly are not.

gvanrossum commented 4 years ago

There are already many situations in Python where for readability you add parentheses even if the syntax doesn't require them. For example:

    a = (b ^ c) | d
    if (a == b and c == d) or e == f: ...
    t = (1, 2, 3)
    return (x, y, z)

I would definitely not change the grammar to disallow an unparenthesized OR pattern before a guard:

case [x] | x if x >= 0:  # This is fine

OTOH if we consider the syntactic priority of if in expressions, if binds stronger than a comma, so it's still reasonable to disallow an open sequence pattern before a guard. The grammar would become a little bit less regular:

case_block:
    | "case" pattern [guard] ':' block
    | "case" open_pattern ':' block

@brandtbucher -- your thoughts?

Tobias-Kohn commented 4 years ago

Interestingly, this aligns well with my first reaction. I felt that putting tuples into parentheses seems fine, but for alternatives it did not seem quite right. But I actually dismissed it as just a feeling and not something I could put my finger down. Anyway, I would certainly be absolutely fine with your distinction of requiring parentheses for tuples with guards, but not for alternatives with guards.

brandtbucher commented 4 years ago

I do not have a strong opinion whether it is right to add sub-guards or not. Drawing from my experience with PEP 614, though, I believe it may be wise to restrict the grammar for the time being.

I'm not too concerned with the regularity of the grammar. I think most users don't realize how complicated the grammar is for basic things like function parameters, for example. This may be one of those cases where making things "feel" simpler actually requires adding complexity under-the-hood.

My initial thought is that whatever we do with sequence patterns should also be done with or-patterns, though your conviction suggests there may be something I'm missing here. If I understand correctly and we decide to add sub-guards in the future, then:

case Foo(x) if x >= 0 | Bar(x) if x < 0:

...would probably be parsed as:

case ((Foo(x) if x >= 0) | Bar(x)) if x < 0:

If we wanted to avoid this foot-gun, we would need to either force the addition of "unnecessary" parentheses:

case (Foo(x) if x >= 0) | (Bar(x) if x < 0):

...or have users fold the guards into the next inner closed pattern (if possible):

case Foo(x if x >= 0) | Bar(x if x < 0):

None of these seem ideal, especially if we could sidestep them now. But this isn't a hill I'm willing to die on, either.

gvanrossum commented 4 years ago

I'm in favor of having fixed priorities for different "operators" (like comma and if) even in vastly different contexts. This makes human scanning easier. (There's also a parsing technique that depends on this.)

Now, there are a few other places where Python violates this, always around commas -- in particular, sometimes '=' binds tighter than comma (keyword args) and sometimes comma binds tighter than = (assignment), but I'd rather not add more of this.

In particular, currently | always binds tighter than if. So I don't think we should allow

case pat1 if expr1 | pat2 if expr2:

even if we were to introduce guards for subpatterns, and so Brandt's

case Foo(x) if x >= 0 | Bar(x) if x < 0:

should just be an error. Both of Brandt's solutions (parenthesize each pattern+guard, or combine the conditions in a single guard) would work, and I see nothing wrong with that.

There's actually a third solution (if we had guards for subpatterns)

case Foo(x if x >= 0) | Bar(x if x < 0):

because if we were to introduce guards on subpatterns, we should go all the way.

But I think I am convinced by Tobias' argument that this isn't desirable anyway. So my proposal is:

and the grammar for this would still be

case_block:
    | "case" pattern [guard] ':' block
    | "case" open_pattern ':' block
gvanrossum commented 4 years ago

I consider this accepted; it needs to be changed in the spec and implementation, and it needs to be explained in the rationale for "case block".

gvanrossum commented 4 years ago

Despite all the :+1:s I am reopening, to argue that it doesn't matter.

Suppose the current proposal supports

case P1, P2 if C: ...  # (1)

as meaning

case (P1, P2) if C: ...  # (2)

Now suppose in the future we want to change the syntax so we can write (among other things)

case P1 if C1, P2 if C2: ...  # (3)

which is grouped as like this

case (P1 if C1), (P2 if C2): ...  # (4)

This would imply that (1) is parsed differently, like this

case P1, (P2 if C): ...  # (5)

Now, the natural semantics of guards and variable bindings make (5) equivalent to (2) -- in either case the condition C is only checked after patterns P1 and P2 have succeeded, so we might as well have a rule that any bindings made by P1 will be accessible to C. This would constrain us to matching subpatterns from left to right (at least if a guard could observe the difference) but that seems a very small price to pay.

IOW, I have changed my mind and I think the current syntax, allowing (1), is fine and doesn't need to be changed. (What people put in their style guides remains to be seen -- I don't think it's going to be confusing or hard to read in general.)

brandtbucher commented 4 years ago

Heh, great point. This seems fine as-is.

Tobias-Kohn commented 4 years ago

Very nice observation, indeed. Fully agreed that it is fine as-is.

gvanrossum commented 4 years ago

Okay, marking as "rejected", i.e. we will continue to allow guards with open patterns. I don't think we need to bring up all this subtlety in the rationale.