gvanrossum / patma

Pattern Matching
1.02k stars 65 forks source link

AND patterns #14

Open gvanrossum opened 4 years ago

gvanrossum commented 4 years ago

Just found these in F#: https://docs.microsoft.com/en-us/dotnet/fsharp/language-reference/pattern-matching#and-pattern

This is a simple extension of the | (OR) pattern -- e.g. we could write

case [x, y] & [0|1, 0|1]: print("Two bits:", x, y)

But we can do this using the walrus operator:

case [x := 0|1, y := 0|1]: ...
brandtbucher commented 4 years ago

Not a huge fan of | and & here. To plug the idea of a stdlib module again, this could be:

case And([Or(0, 1), Or(0, 1)], [x, y]): print("Two bits:", x, y)

A nice benefit is that it could be broken up:

Bit: Pattern = Or(0, 1)
...
case And([Bit, Bit], [x, y]): print("Two bits:", x, y)

It would take a very long time for me to look at 0|1 and read "either 0 or 1"... especially when referring to bit patterns! :wink:

gvanrossum commented 4 years ago

I'm not sure about &, but | seems necessary for constants at least (usually not combined with value extraction). The Or(1, 2) notation seems awfully verbose for that.

Also note PEP 604, which introduces | as a way to spell unions in the type system.

I can see with your background in hardware that | means bitwise OR to you -- but most people don't learn that any more... :-)

brandtbucher commented 4 years ago

Also note PEP 604, which introduces | as a way to spell unions in the type system.

Yeah, but that feels different to me since it uses normal operator overloading on the type object. This seems much more magical.

How do you feel about using the or/and/not keywords here instead? I would have a much better time following:

case [x, y] and [0 or 1, 0 or 1]: print("Two bits:", x, y)

Although this is probably harder to implement as a POC.

I do understand the where you're coming from, though. If I have pre-built Pattern objects, it is nice to be able to combine/modify them with | (and possibly &, ~, and ^?) before using them in a pattern matching statement. I think the hard part for me is mentally wrapping constants or names in a pattern constructor before applying the operator:

0 | 1 is really ConstantPattern(0) | ConstantPattern(1) FOO | BAR is really ConstantPattern(FOO) | ConstantPattern(BAR)

At least using keywords we make it clearer that this is magic, I think.

But my opinions on this are not very strongly held. They're different ways to spell the exact same move. I just think the current choice is a bit hard to intuit.

viridia commented 4 years ago

The use of the vertical bar operator to indicate a union is exists in several programming languages I am familiar with, but not Python. However, in these languages the 'union' we are talking about is a type declaration, not a match expression.

For example, in my TypeScript code, I often use unions of singletons in place of enums, e.g.:

type Size = 'xlarge' | 'large' | 'medium' | 'small' | 'xsmall';

const buttonSize: Size = 'large';

(I don't know if MyPy supports the concept of a 'singleton type' - a type that can only be a single value. The above example is a union of 5 singleton types, which means that a variable or parameter of that type can only be assigned those specific values.)

Using this pattern is less verbose than using an enum (Size.SMALL) and just as type safe. And the IDE is equally capable of auto-suggesting either.

This is not possible in Python due to the requirement that type annotations be evaluated using the same semantics as regular Python code, unlike the exception that you are making for match expressions, which is to say that they conform to Python syntax but the ASTs are interpreted differently.

So the question I am asking is, because this syntax is un-pythonic but common in other languages, what will be the most intuitive for Python programmers?

Tobias-Kohn commented 4 years ago

I would even argue that the bar | is not that uncommon, even for Python programmers. A well established version of pattern matching are regular expressions, and | is not only very common there to mean "or", or "match one of", respectively, but it is also familiar to Python programmers as such.

Moreover, the or keyword does a bit of additional magic in Python, as it converts both sides to booleans first. From the point of view of logic, or connects statements (which are either true or false), whereas | is an operator on values---that happens to be "bitwise or" in the case of integers, but is undefined otherwise. Based on this distinction, I would certainly favour the | over the or.

Finally, I would make a case that operator overloading is already very common in Python and in fact one of its key strengths. Depending on context, + means addition, concatenation, or vector addition, for instance. This makes libraries like NumPy very convenient and intuitive to use. Despite some "corner cases" where two possible interpretations clash, e.g., if a = (1, 2) and b = (3, 4) are both tuples, I get a+b = (1, 2, 3, 4), but if they are vectors (which might look about the same), I get a+b = (4, 6).

Anyway, circling back to the initial topic of "and" patterns, I do not think there is any need for this (given that we have the more convenient walrus operator). I can understand that, from an engineer's perspective, there is the question of what to do with this extra "and" operator lying around after we have already used the "or" operator :). But I feel pattern matching is first and foremost about comparing a structure against complex values. The "and" operator would thus read as "I want x to be value A and value B at the same time". I really cannot see any sensible use case for something like this in general. But if there is, it could probably just as easily be implemented using guards, anyway.

gvanrossum commented 4 years ago

Okay, this confirms my intuition: use '|' and don't have an '&' operator at all.

Tobias-Kohn commented 4 years ago

I had a closer look at pattern matching in F#. One of their use cases for the & operator in pattern matching is for testing attributes. Given a vector Vec(x, y, z), they extract the attributes via (I am citing the syntax rather freely and rather write it as we probably would):

case attr('x', x) & attr('y', y) & attr('z', z): ...

In Python, I would write the very same simply as:

if x := getattr(it, 'x') and y := getattr(it, 'y') and z := getattr(it, 'z'): ...

As we lean much more towards a scheme where matching attributes is part of the core with simply case Vec(x, y, z):, their primary use case still does not make sense for us.

Additionally, a language called Thorn (by IBM) also uses AND patterns and give the following use case to check whether a sequence contains the numbers 3, 4, and 5 in any order:

case [..., 3, ...] && [..., 4, ...] && [..., 5, ...]: ...

With the in operator we already have a very powerful tool to do such a test in Python. In all honesty, I feel this use case is a bit artificial.

So, just to reiterate: although other languages do have an AND pattern, which might even make sense there, I still feel that we should not include it.

ilevkivskyi commented 4 years ago

I agree. Note btw that a named sub-pattern (using walrus) is essentially an & between a name pattern and other pattern (as Guido suggested in OP). And this is IMO the only useful case for & logic in Python I have seen so far.