gvanrossum / patma

Pattern Matching
1.03k stars 65 forks source link

Consider alternative wildcards? #92

Open gvanrossum opened 4 years ago

gvanrossum commented 4 years ago

People seem uneasy with our use of _ as a wildcard. Some proposals that have been made:

This seems to be an important thing to either revisit or explain better (and all of these deserve a place in Rejected Ideas at the least).

There are decent arguments against * and ... (see one of my responses on python-dev).

There is one consideration that makes me look twice at ?: If the opposition against plain name binding gets stronger, we could propose that to bind a variable x you have to write ?x, and then plain ? would make a very natural consequence.

But even without that, I would say that this does not look so bad:

match f():
    case [0, ?]: ...
    case [?, 0]: ...
    case [?, ?]: ...
Tobias-Kohn commented 4 years ago

As you have pointed out, I agree that * and ... are bad choices because (i) they already have a specific meaning (or even several in case of *) and because (ii) they suggest a plural where there is none. Hence, I would not really consider either of these two alternatives.

The ? is a viable alternative and I agree with that ?x and ? then looks like a natural pair for name binding. Although I could live with it, I am not particularly happy with this alternative, though.

Issue #73 takes away much of the speciality that we initially assigned to _. In the end, the only difference between _ and any other name that still remains is that an "assignment"/binding to _ just throws away the value. In other words: pattern matching itself will never change the value of _.

One of the arguments brought up against _ mentions that libraries might already use _ as a valid identifier for something else. This is actually a red herring, because the general use of _ does not clash in any way with pattern matching here. You can still write _('...') for localisation, say, even inside a case block. The only thing that is not possible is to redefine what _ means by pattern matching directly. Hence, pattern matching even cooperates nicely with such libraries!

Another question is whether we actually need a special wildcard. Why not just treat _ like any other name, bind it to values and leave it to the programmer to not care? The reason is with patterns like case (_, _) to give a very brief example. Because _ is a name that does not bind, you can use it multiple times and even in alternatives like case f(x, _) | g(x). In principle, yes, we could do without a wildcard. But IMHO it would severly hurt readability because every possible position would require a new name that is then never ever used...

brandtbucher commented 4 years ago

It also lets the compiler take important shortcuts, such as:

match range(HUGE_INT):
    case [*_, last]:  # No iteration or copying happens here.
        ...

Whatever we choose as a wildcard has to look good in several places:

I think that this is as good a time as any to introduce a ? token. It checks all of the boxes above:

It also has the potential to solve our other sticky issue: load vs. store. Since all of the above are some abstraction of "store", we can denote named stores with name?. This solves the "foot-gun" and "class patterns look like construction" issues, and still looks good:

case Point(x?, y?):
case [start_token, *info?, end_token]:  # Mixed load-store example.

The breakthrough here is a wildcard just looks like a name capture without... a name! Which is what it is!

You'll notice that I prefer it after the name. Let's keep Python reading like English. :wink:

Tobias-Kohn commented 4 years ago

The thing with case Point(x?, y?): is that, to my eyes, it actually looks more like load and compare semantics "is this an x/y?", but I might just be strange in this regard. IMHO, the other way round with ?x would fit slightly better with what it is supposed to mean.

That being said, I still feel that a plain x in pattern matching should have store semantics so that we would not need to write ?x or x? in the first place :wink:.

You'll notice that I prefer it after the name. Lets keep Python reading like English. :wink:

Yes. I fully agree with that.

brandtbucher commented 4 years ago

At first glance, I sort of agree with you. But I think that those asking for explicitly-marked stores (rather than loads, as it currently stands) make great points.

I can see myself being badly bitten by forgetting (or not seeing!) a dot 20 years from now, but I've already trained my brain to see x? as "store x" in 30 minutes. It helps because it's new, I guess!

gvanrossum commented 4 years ago

In my brief explorations with the prototype implementation I agree -- several times I've found myself writing examples like case RED: instead of case .RED:. I don't worry much about reading it -- it's generally obvious what's meant, since the difference is usually between case x: and case .RED:, so there are other clues available to the reader besides the dot. Alas, the parser cannot use those clues. (We've explored checking whether the name is already bound or not and shrank back in horror!)

Still, the only two alternatives available seem to be the current proposal or something using ?. :-(

brandtbucher commented 4 years ago

I see ? as a significant improvement over . and e.g. case-sensitivity, mostly in that it cleanly solves three or so of our problems.

This is a novel case that, I feel, justifies novel syntax. It's an obvious indicator that something special is happening, but without being intrusive or inconsistent.

If Python-Dev responds positively to a simple yes/no on ?, I'd say we pull the trigger.

brandtbucher commented 4 years ago

I'll also note that ? leaves open the door for more complex assignment targets later (a[b]?, a.b?). The current dot syntax pretty much locks us up, since we already allow things like .a.b.

gvanrossum commented 4 years ago

But why would we want more complex assignment targets? That seems an anti-pattern.

brandtbucher commented 4 years ago

I don't necessarily think they're a good idea, but allowing us future flexibility is a small positive, in my opinion.

viridia commented 4 years ago

I am fine with using ? to mean both wildcard and name-binding (in fact, some of my earlier proposals for name binding incorporated this, or $.)

There are a few reasons why I prefer ? to be a prefix rather than a suffix:

gvanrossum commented 4 years ago

I updated the PEP with text about * and ...; but of course we still need to deal with the question mark in the room.

viridia commented 4 years ago

I added a branch showing what expr.py would look like with question mark prefixes for variable bindings.

Delta456 commented 4 years ago

@viridia I agree that it should be a prefix.

My only main reason of not using _ as a wildcard is that it is a valid identifier and it will just be a special case in match.

thautwarm commented 4 years ago

@Delta456 Consider wildcard in nested patterns. C(Sub1(_, 1), Sub2(...)), what's your alternative for wildcard patterns?

Delta456 commented 4 years ago

@thautwarm ? seems more suitable to me TBH.

thautwarm commented 4 years ago
  1. Taking action to introduce ? will influence lots of stuffs, for instance, pep 505. Before taking verified synthetic considerations, adopting this is problematic.

  2. In terms of wildcard patterns, we're not questioning anything. Familiar uses of ? are concerning predicates or special checking, like how ? used in various regular expression variants. Could you please introduce how ? relates to wildcard?

Delta456 commented 4 years ago
  1. I will have to look at that, didn't see it earlier.

  2. After seeing the discussions above, it looks better than * and ....

viridia commented 4 years ago

Thank you for pointing out the existence of PEP 505, I wasn't aware of that (I had meant to ask if there was a PEP of this kind, since I use this same feature frequently in TypeScript as part of my day job).

I don't believe there is any conflict with pep 505 - that proposal uses ? as a suffix operator. The two uses can easily and unambiguously be distinguished by the parser, similar to the way it does for unary minus vs. infix subtract.

Stepping back for a moment, I believe what is being proposed is to kill two birds with one stone:

The ? operator can serve both purposes. Used alone, it represents a wildcard. Used as a prefix operator, it introduces a named pattern variable. While these two uses are different, they can be unified in a single coherent concept: ? means "match anything", and if it is followed by a name, then whatever is matched is stored in that name, otherwise the value is thrown away.

However, I don't want to start a discussion of named pattern variables here, there is another topic for that: #90

thautwarm commented 4 years ago

@viridia

I don't believe there is any conflict with pep 505 - that proposal uses ? as a suffix operator. The two uses can easily and unambiguously be distinguished by the parser, similar to the way it does for unary minus vs. infix subtract.

Firstly, I think the syntax of PEP 505 is not finally decided yet. Too early to say "no conflicts", unless we make a convincing explanation for this.

Secondly, consider ambiguities of the parser is not sufficient, the consistency of semantics is important as well, a good example to show this is the use of * in lvalue and rvalue.

If ? got introduced, and then PEP 505 got introduced, it produces a gap of semantics.

Only when most of the users feel "okay, this is consistent", there is no problem of consistency.

The ? operator can serve both purposes ... ? means "match anything"

? haven't been used in this way so far, in a common sense. We should at least give people a familiarity about "? = matching anything".

thautwarm commented 4 years ago

By gathering the discussions of this issue and #90, I have a proposal.

  1. Use a prefix/suffix operator to indicate store context.

    I personally feel bad about ?, *, and _ has a lot of drawbacks, so please let me temporarily use ! here. ! is used for indicating side effects in many programming languages like Rust, Julia, Ruby, etc.

    case C(str, !str) looks good to me, you may not agree but the selection of the store operator wouldn't affect this proposal.

  2. Implementing __match__ protocol for the builtin object any.

Then the syntax could look this way:

match value:
    case int:
         ...
    case str:
         ...
    case C(!a):
         # do stuff with a
    case any:
         ...

It looks clarified and concise to me.

gvanrossum commented 4 years ago

Does ‘case int:’ mean ‘value == int’ or ‘isinstance(value, int)’?

Doesn’t feel clarified to me. ;-(

In general I think radical rewrites of the proposal are not welcome at this point. We need to decide #90 and then go to the Steering Council.

thautwarm commented 4 years ago

Sorry, it's value == int. Forget this.

Tobias-Kohn commented 4 years ago

@thautwarm I like your votum to consider this not only from the parser's, but first and foremost from the user's perspective. However, this is very much in line what we have been doing here all along :wink:.

Secondly, consider ambiguities of the parser is not sufficient, the consistency of semantics is important as well, a good example to show this is the use of * in lvalue and rvalue.

Only when most of the users feel "okay, this is consistent", there is no problem of consistency.

ambientnuance commented 4 years ago

If ? got introduced, and then PEP 505 got introduced, it produces a gap of semantics.

A reasonable compromise could be to have 505's None-aware operator always be ??, rather than using ?? and ? in different contexts. In my view, this creates internal consistency in 505 and makes its usage more obvious in the indexing and access cases (circuit.break?.access.stuff | circuit.break??.access.stuff).

In relation to the match/case construct, it seems important to prioritise semantic clarity in a flow control feature over a comparatively niche operator. This is in the case that ? is used in this PEP, for which I think @brandtbucher's update to expr.py in #105 makes a strong case for.

thautwarm commented 4 years ago

A reasonable compromise could be to have 505's None-aware operator always be ??

This seems good to me, but still ? looks too strange as a wildcard..

As * seems rejected, I want to add a new option, how about /?

thautwarm commented 4 years ago

Or maybe use pass keyword in the patterns for wildcard?

This meets python's traditional look, and has already built familiar semantics for being wildcard.

match xxx:
    case C(pass, yyu): ...
ambientnuance commented 4 years ago

If '?' were to be used here, I can see a way for it to take on the role of a kind of 'query' character across different contexts. Maybe it's clumsy, but it makes sense in my head at the moment.

match obj:
    case (0, ?):    # Don't know what goes here
    case (0, x?):   # What are you, given your place in the pattern?
    case (0, ?x):   # Don't what goes here. Let's give it a name
    case (0, int?)  # Are you an int? - int() is a good alternative

PEP 505 Modified to use a 2 character operator, with '_' denoting None.

obj??.attribute. # Are you anything but None? If so, access attribute

seq ?_= []  # Are you None? If so, initialise as list
seq ??= []  # Are you anything but None? If so, reset to a list

Typing Already a pseudo-query, so '?' isn't used as directly as above. None seems to be the only alternative one would care about, outside of Unions (i.e. could '?' query anything else here?).

x: int?   # Optional[int]
x: int?_  # Optional[int] - more verbose, but consistent with the 'ask if None' pattern

EDIT: The intent of this comment is not to suggest an extension or re-work of PEP 622. It is strictly a hypothetical roadmap for a common usage of the '?' character, in response to concerns regarding it being a new special character.

brandtbucher commented 4 years ago

@ambientnuance, please slow down. Your stream-of-consiousness brainstorming session is way outside the scope of this issue (and even the PEP). It belongs in a blog post, not our issue tracker.

I'll repeat Guido's comment here:

In general I think radical rewrites of the proposal are not welcome at this point.

EDIT: Thanks for removing the messages.

dmoisset commented 4 years ago

But why would we want more complex assignment targets? That seems an anti-pattern.

I pretty much could see myself writing an __init__ that takes a JSON-like dict and matches everything in it to self.foo. That is:

class ErrorResponse:
    def __init__(self, jsondata):
        match jsondata:
            case {'status': int(self.status?), 'message': str(self.message?)}:
                pass # Or do some extra initialization
            case _: raise TypeError()

I know that I could capture variables and then assign to self inside the block, but this didn't look that evil to me :)

gvanrossum commented 4 years ago

No, I don’t want to encourage that.

viridia commented 4 years ago

I am -1 on making the None-aware operators more verbose. As someone who uses these operators all the time (my day job uses TypeScript), I would not want to add visual clutter to such a useful operator.

From a purely parsing standpoint, there is no confusion, because patterns live in a separate parsing context (same as with type hints).

Tobias-Kohn commented 4 years ago

As I just wrote in the thread for issue #93, it seems like there is basically no mainstream language with support for pattern matching that does not use _ as a wildcard. Hence, Python would deviate here from an otherwise well established standard.

From what I gathered, the main reason why some people are not comfortable with _ as a wildcard is because _ is a valid name and used elsewhere to denote a function, say (the dev-mailing list mentions i18n usage). However, I really feel this is just an example of bad communication on our part. The only special thing about _ is that is will not bind a value so that it can be used repeatedly in a pattern. This also means, however, that it does not interfere with _ being used as a function in any way! Rather to the contrary, it preserves a global meaning/binding of _.

Given that the wildcard _ seems to be a widely established and international standard for pattern matching, it would be somewhat ironic if we were the only ones not to use it because of internationalisation concerns :wink:.

gvanrossum commented 4 years ago

Okay, marking as rejected and needs more pep.