Proposing a simpler match protocol: ditch it

dmoisset commented 4 years ago

I was about to describe some weird cases and limitations and issues I think the __match__() protocol has, but instead of that I think something simpler can be done: getting rid of it. (Note: I'm talking about the function, not about __match_args__)

What would the PEP would look like without it? The default behaviour described by the current PEP would be instead the only possibility. Class patterns would check first if the object is an instance of the pattern class; and then match directly on the matched object (no proxies).

Why I'm proposing this:

There is no proposed usage of this feature (overriding __match__) in the PEP. Even in the discussion (python-dev/this repo) most suggestions are quite artificial.
We don't need it to implement in any stdlib class (which would be a strong argument for it). Or in positive terms, the PEP without that function is good enough to cover the whole stdlib and most useful examples you've come up with.
the PEP is shorter and easier to explain, the implementation simpler.
We can add it later. When we do, we'll do it with actual users' code applying this feature and hitting real limitations.
This idea fits like a glove into around half of the design tenets in #101
For the isinstance check, we still have some customizability because isinstance already has a customizable protocol: __instancecheck__.
The prototype doesn't even test it, even if a lot of it is tested :) (I know it's a prototype, but this is also shows that no one cared about it right now)

Some precedents:

the other Python pattern matching statement doesn't need anything like this ;-)
In general other languages don't provide ways to "customize" this either
- Minor exception: Scala has an unapply method to transform things into structs which is more similar to the approach I hinted in the notes I shared in python-dev and is somewhat more related to what we do with __match_args__.

The scenarios I see if we get rid of __match__() are:

You are using a feature with a class you're writing for use it this way, or that you previously owned. If you need to match with something that's not accessible, you can add a property or attribute to your class with whatever you want to match. That's much easier to do than creating a proxy object.
You are using this with a class you don't own. You can use a guard (which shouldn't look much worse than the code you'd write in a world without pattern matching) or even in a more complicated case, write a proxy and apply it.

Take the example https://github.com/gvanrossum/patma/issues/66#issuecomment-643842341 : in scenario 1 you just add a connection_name property returning f"{host}:{port}". In scenario 2 you can match Connection(host, port) and use a guard f"{host}:{port}" == "python.org:80" (which would be similar to the if statement you'd write without this PEP)

If you look at the history of this idea it appears at first in #8 when the idea of checking all sort of things with a pattern library was under consideration, or for parametrization as in #24 . These ideas were deferred, and the current __match__() semantics doesn't easily support those anyway, so it's time to rethink why we still have it.

brandtbucher commented 4 years ago

I find this well-reasoned and quite persuasive. If we went this route, we would probably still need a way for classes to opt-out of the match protocol (currently done with __match__ = None). Probably __match_args__ = None?

As mentioned, this is a two-way door and could have the nice effect of giving all class matches a small performance bump.

gvanrossum commented 4 years ago

Yeah, this sounds attractive (though it won't help much solving the load/store issue). It has two origins: Tobias' 2018 prototype had an __unapply__, and Ivan's original PEP (#19) has a __match__. Both got it directly from Scala.

@brandtbucher Why would we even need a way for classes to opt out?

A note on the special cases for int(i), str(s) etc.: these could be done by setting __match_args__ = ["__self__"] and adding a property like this to those types:

@property
def __self__(self):
    return self

viridia commented 4 years ago

Huh. Worth thinking about...

I think the only downside is that if in the future we decide to add a __match__ function, there will be a small performance penalty added for testing whether the method exists, which may affect existing code. (Whereas if we have it from the beginning, programmers will accept the penalty as a trade-off.)

viridia commented 4 years ago

On the issue of the special-case for int and friends, I was thinking __match_args__ = True.

viridia commented 4 years ago

Actually, here's a brainstorm: what if the future match protocol is __match_args__ = <function>. That way we only have to look up one symbol. As to what that function's signature will be, we can decide that later.

gvanrossum commented 4 years ago

I'd prefer not to have funny values for __match_args__ (not even None). That kind of hack always ends up consuming a lot of energy for static type checkers.

brandtbucher commented 4 years ago

Why would we even need a way for classes to opt out?

The current spec allows classes to opt-out, and I guess I assumed we had a good reason for that buried in our past discussions somewhere. I can't think of it off of the top of my head though, and can't find any record of it...

So yeah, probably not needed.

brandtbucher commented 4 years ago

This actually makes things a lot cleaner.

Okay, I support it.

I think the only downside is that if in the future we decide to add a __match__ function, there will be a small performance penalty added for testing whether the method exists, which may affect existing code. (Whereas if we have it from the beginning, programmers will accept the penalty as a trade-off.)

If the only downside is that it may get imperceptibly slower in the future rather than always being imperceptibly slower, I think that's totally fine. I'm not even sure that's a downside. :wink:

dmoisset commented 4 years ago

Actually, here's a brainstorm: what if the future match protocol is __match_args__ = <function>. That way we only have to look up one symbol. As to what that function's signature will be, we can decide that later.

The whole point of this is that right now, it's not worth discussing what the match protocol will be, and that's fine because we don't have to know. We don't even know if it will be a single function, two, or perhaps allowing a function per attribute in match args, or something completely different.

gvanrossum commented 4 years ago

The whole point of this is that right now, it's not worth discussing what the match protocol will be, and that's fine because we don't have to know.

In defense of @viridia, during the development of this PEP we've been talking a lot about one-way doors and two-way doors. If we have an indication that a decision is a two-way door we can stop worrying. The observation that in the future we could set __match_args__ = <function> without breaking backwards compatibility shows that it's a two-way door even if we were to worry about the cost of doing a single attribute lookup. But I actually agree with Brandt that we needn't worry about that.

gvanrossum commented 4 years ago

Okay, I am also in favor.

natelust commented 4 years ago

I am worried that by mixing these two interfaces it will introduce less flexibility. What if someone wanted to create an object that matched a few different forms of something. Say a URI class that accepted a properly formatted str, or a version agnostic IP class, or something that could handle ints as enum values. With a __match__ method this could be a small overhead, but possible a win for code readability. The same behavior could be implemented inside __instancecheck__ and thus in the proposed match protocol, but it would also fire anytime you did an instance check in any other context.

Maybe that is an argument against why matching like this should not be done, or why it should be done some other way like inheritance, and everything will be ok. I just wanted to consider that if match syntax is functionality that is richer than a simple isinstance check, is it ok to take over the lower level functionality. Will this be incompatible with any __instancecheck__ that people have written?

Edited to reflect, not backwards incompatible. It would not break old code, there is just a chance that old code does not work with match syntax.

viridia commented 4 years ago

So first off, I am on board with the proposal to ditch __match__ for now.

As far as the issue around __match_args__, I think what I was really trying to say was that we could support more complex protocols by returning a more complex object (subject to the trade-offs surrounding type checkers dealing with varying types), or possibly a function which returns a more complex object. I'm not necessarily proposing that we actually do this, rather what I am saying is that we have the option in the future to extend match in arbitrary ways, without the overhead involved in looking up an additional property on the class dict - I assume a fundamental type check ("is this an object?") is much cheaper than looking up a name, and we're doing that test already, so there would be no additional overhead for classes that used the old protocol. (Although if we did decide to take this route in the future, it would be better if we named the property __match_spec__ or something that doesn't imply that it's a list of args.

gvanrossum commented 4 years ago

So this needs a huge amount of editing to the PEP. The description of the __match__ algorithm should probably not just be thrown out but changed into a full description of how class patterns are implemented. But there are dozens of other mentions of __match__. Can I ask for a volunteer to work on this? (Let this issue be the place where this is coordinated.)

viridia commented 4 years ago

I will start working on re-writing the 'deferred ideas' section at least.

viridia commented 4 years ago

OK finished that: https://github.com/python/peps/pull/1483

I was going to work on 'rejected ideas' next, but it doesn't mention the __match__ method at all apparently.

Tobias-Kohn commented 4 years ago

Sorry, I am obviously late to the party (living in a different time zone). But I am clearly against ditching the __match__ protocol! It seems a bit overly hasty to me to ditch such a central feature within less than twelve hours, particularly in comparison to how long we are mulling over syntactic issues!

First off, it is not only Scala that has this kind of protocol, but is also an important feature in F#, say (various less known experimental languages had this feature as well). So, before just getting rid of it, we might want to pause for a moment and consider why these languages saw the necessity to introduce it in the first place. And secondly, it is not just some obscure extension mechanism, but actually a central piece to unify OOP with pattern matching that has evolved over decades. Before we just ditch it, let's perhaps briefly talk about it, shall we?

The issue with classical pattern matching is that it requires direct access to the structure of data. This is in stark contrast to the basic principles of OOP such as encapsulation and abstraction (these play a somewhat lesser role in Python, of course). In other words: we basically want to base the deconstruction of an object on an abstract interface and not the internal representation.

In the functional programming languages where pattern matching originated, the tuple is basically the backbone of data structures and constructors take a number of arguments and put them into a tuple, (together with a type marker). Hence, the signature of constructors fully reflect the data structure of the object they create. This is no longer the case with fully OOP, where an object's structure can significantly deviate from the signature of its constructor. We, too, discussed this issue and it gave rise to the __match_args__, which takes on the role of a de-constructor from that point of view. I.e., this is an inverse signature to replace the signature of the constructor to honour this discrepancy between constructor and created object. So far so good.

The interesting part comes now if we acknowledge that there is often more than one single way for representing data, and that given data might be interpreted in different ways. For instance, complex numbers can both be written in rectangular coordinates as a + bi as well as in polar form r cis(phi). The value 0x41 can mean the integer 65 or the latter 'A'. This insight gave rise to what you may call views, active patterns, or unapply methods: the idea to separate the query representation of data in pattern matching from their actual implementation in a specific object.

For instance, in their paper on pattern matching in F#, Don Syme et al. motivated active patterns through the complex number example I alluded to above, here translated to Python (I am not following the full protocol here, but trying to give an impression of the original idea):

class Polar:
    def __match__(subject):
        if isinstance(subject, complex):
            return cmath.polar(subject)
        else:
            return None

match c:
    case Polar(1.0, _): ...
    case Rect(x, y) if x**2 + y**2 == 1: ...

On the other hand, given some byte stream, we might want to try and decode it until we find the correct way to interpret the given data. The example I give here is almost ridiculously simple, but I hope to better convey the idea this way:

match some_byte_stream:
    case ASCII(text): ...
    case UTF8(text): ...
    case UTF16(text, endianess): ...

An example taken from the Scala book would be the following:

class Email:
    def __match__(text):
        if text.count('@') == 1:
            i = text.index('@')
            return (text[:i], text[i+1:])
        else:
            return None

match user_input:
    case Email(user_name, server):
        ...
    case user_name:
        ...

What all these examples have in common is that they clearly separate data from representation. By explicitly not doing an isinstance check, but allowing data to be fitted to a structure that is most useful for the current situation, we end up with an enormously powerful and versatile tool.

In a simplified way, we could say that the __match__protocol supports conditional type casts of highly complex data structures.

Are you sure you want to throw all this power overboard?

brandtbucher commented 4 years ago

@Tobias-Kohn

Are you sure you want to throw all this power overboard?

Yeah, but only because it's big enough to sink the boat and powerful enough to swim on its own.

I cringe a bit every time I need to say or type the words "match proxy" when discussing this proposal with others. It's distracting, and much of the feedback we got on the PEP was from people stuck trying to understand what exactly was happening when they used a class pattern. I don't want users to fear those calls (they're already confusing enough for many people), and just telling them "it's sugar for isinstance" clarifies the meaning instantly. Most people just don't need __match__, so I don't want to tempt them to override it by sticking it on everything they create and devoting so much space to explaining and specifying it in the PEP.

Which brings me to my next point: adding methods to object is something that should not be taken lightly. This method will be inherited by literally every instance as soon as this patch is merged. That's a big, pervasive change that's ~very hard~ impossible to undo; this proposal to ditch __match__ is so compelling to me because it gives us a great deal of the match functionality without having to do that (because it fits into Python's existing object model so nicely). If we're going to add a method to object, I want to be 110% sure it's perfectly suited to the needs of those who are going to override it.

Thanks for your examples, by the way. While they do a good job of selling the abstraction of views over a single object, they currently aren't served very well by the __match__ method. Two of your examples return tuples (not proxies with mapped attributes), so as it stands your patterns would be more like Email([user_name, server]) and Polar([1.0, _]) (and even then they require the special behavior we give some of the built-in types). These examples are light enough that wrapping the return values in a proxy makes the __match__ protocol (as it's currently designed) seem much less attractive. Your second example is less compelling to me, because I can't imagine much else happening in each suite besides pass, and just using text and maybe endianess below (in other words, it could probably just be a function call that returns a two-tuple). Any more complicated parsing for binary formats would probably be better served by an actual parser to prevent backtracking.

I think that if a __match__ protocol were added later (and it very well may), it would be optional, and likely also look like the more powerful context-aware variants that have been proposed by others. But I don't think we need that yet, and I think the PEP has a better chance of acceptance with hooks like these being praised from the comfort of "Deferred Ideas".

Tobias-Kohn commented 4 years ago

Yes, I did not go through the entire __match__ protocol. My point, after all, is not about the exact protocol, but about the fact that having one is a very powerful and versatile tool. If the __match__ returned just a tuple to match positional arguments after all, I could live with that (and I think that was more or less my original proposal :wink:). The full proxy then evolved as a compromise.

But let us just take a step back and not concentrate on the exact protocol for a moment. The suggestion is then to just throw everything out, basically reducing pattern matching to isinstance checks with attribute binding. There will be no such thing as duck typing in this anymore, only hard coded types.

brandtbucher commented 4 years ago

But let us just take a step back and not concentrate on the exact protocol for a moment. The suggestion is then to just throw everything out, basically reducing pattern matching to isinstance checks with attribute binding. There will be no such thing as duck typing in this anymore, only hard coded types.

Sure there is, __instancecheck__!

gvanrossum commented 4 years ago

I will have to ponder this for a while.

brandtbucher commented 4 years ago

I guess, for me, it comes down to this:

Does the PEP need __match__ (even as a totally optional protocol), and if not, will it significantly benefit from it?

I don't think so. I agree that once people are comfortable with pattern matching in Python and we have a better idea of its use-cases and limitations at scale, we absolutely should consider it. But I think it currently just complicates an already complicated proposal.

gvanrossum commented 4 years ago

Hm. I've seen some people confused by the (indeed very complex) description of __match__ in the first draft. But I don't recall seeing significant opposition based on its presence.

Much of the remaining description of __match__ in the current draft is about describing how matching happens after __match__ has returned (including how __match_args__ is used), so it cannot be cut.

Perhaps you can submit a PR that shows how the PEP would be simplified if we were to ditch __match__?

dmoisset commented 4 years ago

Perhaps you can submit a PR that shows how the PEP would be simplified if we were to ditch __match__?

I can do that

viridia commented 4 years ago

I think we will have a custom matching protocol eventually, but I don't think that the current protocol defined in the PEP is the right one - and so leaving it out gives us the time to develop the right one.

I think that there is a tension between simplicity and power: we started out with a relatively powerful match protocol that was complex and not terribly efficient, and then we reduced that to one which was simple but not very powerful. The match protocol as currently defined can't handle many of the interesting and fanciful use cases that people have come up with, since it has no access to the 'match signature' as was discussed in earlier threads. The argument in favor of the simpler protocol was that those use cases weren't realistic, but the problem with that argument is that we don't know that for sure.

Initially, I think we all believed that a custom match protocol was absolutely required because we couldn't support matching in the standard library without adding custom matching to selected classes. What's changed (for me anyway) is the recognition that __match_args__ by itself is all the customization that the standard library needs. So now we have the option to defer __match__ which we didn't before.

I suspect that when we are finally ready to settle on a custom matching protocol, it will look quite different from the current proposal:

It will be a lot more flexible.
It will have additional context passed in to __match__.
It won't be as performant, but that's OK since it will be used extremely rarely.
It won't cost anything if you don't use it (speed-wise at least - it may have overhead costs in terms of code generated.)

(To that last point I am thinking about the analogy to C++ RTTI - turning that compiler flag on adds extra information to every class, but doesn't have a speed penalty unless you actually do a dynamic cast.)

brandtbucher commented 4 years ago

@dmoisset Sorry, I thought he was talking to me (I already finished something and will have a PR up in a moment).

dmoisset commented 4 years ago

No problem ☺️

On Tue, 30 Jun 2020, 18:29 Brandt Bucher, notifications@github.com wrote:

@dmoisset https://github.com/dmoisset Sorry, I though he was talking to me (I already finished something and will have a PR up in a moment).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/gvanrossum/patma/issues/115#issuecomment-651936780, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABRAHNLGAKMSJMDHMDTNFLRZIOIDANCNFSM4OLUJQLQ .

brandtbucher commented 4 years ago

python/peps#1484

dmoisset commented 4 years ago

(...) The issue with classical pattern matching is that it requires direct access to the structure of data. This is in stark contrast to the basic principles of OOP such as encapsulation and abstraction (these play a somewhat lesser role in Python, of course). In other words: we basically want to base the deconstruction of an object on an abstract interface and not the internal representation.

Getting rid of __match__ does not violate encapsulation in any way: we have __match_args__ which describes attribute names, and those attributes can be fully encapsulated properties expressing abstract interface.

(...) The interesting part comes now if we acknowledge that there is often more than one single way for representing data, and that given data might be interpreted in different ways. For instance, complex numbers can both be written in rectangular coordinates as a + bi as well as in polar form r cis(phi). The value 0x41 can mean the integer 65 or the latter 'A'. This insight gave rise to what you may call views, active patterns, or unapply methods: the idea to separate the query representation of data in pattern matching from their actual implementation in a specific object.

ok, looking at the F# examples (I was not aware of active patterns, they look fun) and your descriptions at least now I think I understand where were you aiming at with the initially released PEP (thank you!). There still may be a way to achieve that in different ways than the current match protocol (which still isn't very practical to use as @brandbutcher described), and that still can be added later (sorry if the "ditch it" language was a bit strong; it was a rethorical device but I think we're likely to need some matching flexibility later, we just don't know which one yet).

Even for the examples you give, we are still likely to be able to:

match if the relevant attributes (for example polar/rectangualr coords of a point, or RGB/HSV representations of a Color class) are available as attributes, which should happen in reasonable designed APIs.
use a reasonable guard (again, if your class is reasonably designed they probably though of something getting at this data using an if statement)
use something else (i'd rather define a parse_email(s: str) -> Tuple[str, str] than an Email custom matcher for most normal situations.)
And for cases that still fall through the cracks, what we may need is something completely different (for example, adding those deconstructors as callables inside __match_args__). I'm not actually suggesting this particular solution but pointing that there's life beyond __match__ as it is now

(...)


match c:
    case Polar(1.0, _): ...
    case Rect(x, y) if x**2 + y**2 == 1: ...

I know this is a quick example, but both branches match the same thing, so you'd keep one (and at this point if abs(c)==1.0: ... looks more readable to me... we don't have to always use match!). I just point it because I haven't seen a more or less real example of code where I wouldn't do something else rather than write my custom __match__

On the other hand, given some byte stream, we might want to try and decode it until we find the correct way to interpret the given data. The example I give here is almost ridiculously simple, but I hope to better convey the idea this way:
match some_byte_stream:
    case ASCII(text): ...
    case UTF8(text): ...
    case UTF16(text, endianess): ...

I assume there's some way to get the encoding from the stream, so. (I assume that there is because if there isn't how would the matcher work anyway?):

match some byte_stream:
    case Stream(encoding='ascii'): ...
    case Stream(encoding='utf-8'): ...

An example taken from the Scala book would be the following:

class Email:
    def __match__(text):
        if text.count('@') == 1:
            i = text.index('@')
            return (text[:i], text[i+1:])
        else:
            return None

match user_input:
    case Email(user_name, server):
        ...
    case user_name:
        ...

To me this look flatter, simpler code, still nice:

Email = namedtuple("Email", "user, server")
def parse_email(e):
    return Email(e.split('@')) if e.count('@') == 1 else None

if e := parse_email(user_input):
    do_something(e.username, e.server)
else:
    do_something(user_input)

Are you sure you want to throw all this power overboard?

For now, until the pep is approved, yes :)

Later, for most of the cases where my assumption fail, being able to have an extra callable that can be injected as a "property" is probably the easiest interface to use, instead of what __match__ does.

gvanrossum commented 4 years ago

In the end I still think we should defer the __match__ protocol. I hope @Tobias-Kohn will agree to this, so we don't have to post separate majority and minority opinions. But I appreciate Tobias' description of active patterns.

Active patterns are certainly interesting, and they all seem to have in common that they cannot be implemented using just __instancecheck__ overloading and a few extra properties and/or __match_args__. OTOH they can all be implemented using __match__() and a proxy class (and __match_args__). Presumably this is also what Taine was after with his Urlsplit example for AND patterns (#97).

But I think that active patterns will only become important once we're all using match statements daily -- a class like Email has no use when you want to parse an email address outside a match statement.

Also, the __match__ protocol currently on the docket isn't the easiest protocol for implementing active patterns: something that would actually support Tobias' pseudo-code would be better. So rather than spending more time debating the perfect protocol for custom matchers, let's reduce the size of the PEP.

Tobias-Kohn commented 4 years ago

Having given it some thought, I am ok with deferring this and concentrating on the fundamental infrastructure first (i.e. basic syntax and semantics, getting the 'simple' cases to work, etc).

gvanrossum / patma

Proposing a simpler match protocol: ditch it #115