gvanrossum / patma

Pattern Matching
1.02k stars 64 forks source link

Revisit load vs. store #90

Open gvanrossum opened 4 years ago

gvanrossum commented 4 years ago

A bunch of folks on python-dev brought up that it's confusing to have case foo: mean "assign to foo" and have case .foo: to alter the meaning to "compare to the value of foo".

I think we're going to need another round of this discussion.

viridia commented 4 years ago

For marking loads, I don't really have a preference.

I know that backticks were mentioned earlier, but I don't know if anyone specifically floated the idea of using backticks to indicate a load (as opposed to a store):

  case Colors.BLACK:
    # stuff
  case `BLACK`:
    # different stuff

Which makes BLACK kind of look like a literal if you squint.

This may be opening up a can of worms, but technically you could put any arbitrary expression within the backticks and the pattern machinery would treat it as a constant.

  case `Colors["BLACK"]`:
    # different stuff
gvanrossum commented 4 years ago

Tobias wrote about this. Scala allows backticks for loads, but apparently they aren't popular in patterns. (It also has UPPER and dot-in-middle.)

I was hoping to reserve backticks for tagged template strings like in JS.

Tobias-Kohn commented 4 years ago

@viridia The backticks were on the table somewhere, as this is exactly what, e.g., Scala does. The potential additional benefit that we could get out of backticks is that we could also mark names in other places. As I mentioned before somewhere: Jython (which is still based on Python 2) has the problem that print is keyword but also a common method name in the Java libraries to be invoked. Hence, writing foo.'print' could potentially solve this issue. On the other hand, Guido meant that we should be careful with using the backticks too lightly for such a rather narrow use case, and I tend to quite agree with him.

The problem with Colors["BLACK"] is something that immediately comes up as soon as we have any kind of special marker, rather than a rule. Even if we used $, say, you could still argue that you might write $(Colors["BLACK"]).


@gvanrossum I would go either for the purist or the pragmatic rule. I like the leading dot rule a lot because it feels like a clever way to make a general rule applicable, but I also understand the general reservations about it. Python is mostly quite robust in that tiny changes to the code seldomly completely change the meaning of a program—I would regard this is one the core features for making Python such a good choice in education.

I am not entirely convinced that we need the uppercase rule, and I see two potential issues with it. First, there might be just as much resistance as in the case of the leading dot (sorry, I lost the thread on the dev-mailing list, so this might have been discussed there already). Second, it could be very tempting/inviting to just do something like:

def foo(x, Y):
    match x:
        case Y:
            ...

The more thought I have given it, the more I am convinced that we really should be careful with 'load and compare' semantics. Given all the issues and potential pitfalls that come with 'load and compare' semantics, I would certainly prefer the purist approach with only literals and dotted names (even the dotted names are already a compromise from the purist view :wink:). If it turns out that we need some load marker later on, we can still introduce it.

viridia commented 4 years ago

Good points.

There's one other idea I want to throw out there - again, I am just brainstorming here, don't expect me to put up a serious defense of any of these ideas - is to say, well, since the 'load and compare' is using equality to compare the name, then use the equality operator:

match x:
  case ==BLACK:

Yeah it looks ugly. But it avoids the problem with leading dot that people raised, which is that dot is such a small glyph that people might not notice it. This certainly doesn't suffer from that problem. And, maybe ugliness is a virtue - in the sense that we might want to gently discourage people from using unqualified names.

Tobias-Kohn commented 4 years ago

The idea with the equality operator becomes quite weird as soon as we put it into the context of attributes/arguments.

match x:
    case Circle(color===BLACK):
        ...

Maybe, we would have to make sure there is a space separating it properly, i.e. writing color= ==BLACK. But anyway, I don't think it would really work.

Besides that, if we are considering switch-statement semantics, you could write:

match x:
    case ==BLACK: 
        ...
    case ==RED: 
        ...

instead of

if x == BLACK: 
    ...
elif x == RED:
    ...

Hence, I think this is a perfect example to show that we do not need to cover the switch statement, because if-elif-else chains already cover that nicely :).

stereobutter commented 4 years ago

I hope it won't be held against me when I throw (yet another) new idea into the ring at this stage. We all agree that the major issue of the pep is that store and load are at odds with each other, yet there are compelling arguments for having both of them. Seeing the latest post from @viridia led me to think about this whole affair from the perspective of operators vs. keywords again. Could we not solve the issue of differentiating store from load with a keyword instead of an operator-like symbol (e.g. x?, x!, $x, &x) or an adornment ( `x`, <x>))? To be honest none of them feel quite pythonic (some of them even look a bit like Perl if you squint 🤢). So how could this look like with keywords instead:

HENRYS_FAV_COLOR = 'black'

match random_new_car():
    case Car(color=HENRYS_FAV_COLOR):  # load syntax
        print('Henry Ford approves')
    as Car(color=color):  # store syntax
        print(f'Ugh, not another {color} car')

Pros:

Neutral (either Pro or Con, depending on your take on the issue)


Edit:

A syntax variant with keywords of same length

match random_new_car():
    case Car(color=HENRYS_FAV_COLOR): 
        print('Henry Ford approves')
    with Car(color=color): 
        print(f'Ugh, not another {color} car')
Tobias-Kohn commented 4 years ago

If I try to summarise @SaschaSchlemmer's suggestion, there are three levels at which the switch/load and the match/store semantics could potentially coexist:

  1. Coarse grained: by having both a switch and a match statement;
  2. Sascha's middle path: by having two different types of case clauses;
  3. Fine grained: by controlling the semantics of individual tokens.

The new idea is this middle path, where differentiate between load and store semantics on a clause level. @SaschaSchlemmer: please correct me if I am wrong.

Of these possibilities, the first one is clearly beyond the scope of this PEP since a switch statement could be introduced completely independent of our work here (whether a switch statement makes sense is a different story, of course). The second one now marries the two concepts just enough to consolidate them under a common umbrella: the top match statement.

I am highly sceptical as to whether this approach could work, as I see a couple of issues here. In short: I think it ends up being an overly complex solution that does not fully solve the problem. We'd pay a high price for little gain.

stereobutter commented 4 years ago

@Tobias-Kohn You summarized my train of though pretty well (although I didn't explicitly think about the 1. level since this use case is already covered pretty well by if ... elif ...else). It seemed to me that cramming all the features (store and load) together (3. level) makes match quite powerful but maybe just too hard to explain (and thus sell the pep).

Tobias-Kohn commented 4 years ago

@SaschaSchlemmer It seems that we are generally quite in agreement with only minor points where we differ. And I would certainly be interested if you know of a compelling example that shows that the two case clauses really make sense.

having two keywords attached to match also is a two-way door [...]

While I totally agree with you that this could (and probably should) be addressed in another PEP, this really is a one-way door. Once we introduce two different case clauses, there is no going back. We should therefore not do something like this lightly.

stereobutter commented 4 years ago

@Tobias-Kohn I meant that the two-way door is that we could introduce match and a case clause with store semantics now and decide later whether we'd like load semantics at the token level or via another clause. No need for deciding for/against the second case clause now (except maybe for some foresight in naming the case clause proposed in this pep)

gvanrossum commented 4 years ago

I find it unacceptable that load vs. store applies to the whole clause. Also when I first read Sascha’s first example I didn’t understand it because I didn’t notice the ‘as’.

Maybe we could debate the UPPERCASE rule, and decide first two preferences. In Scala the UPPER rule seems to work well. Does Rust have it?

stereobutter commented 4 years ago

@Tobias-Kohn to be honest I have not seen a convincing example where I'd prefer load semantics (except for literal values) over using store semantics and an appropriate guard.

brandtbucher commented 4 years ago

I still dislike the uppercase rule, because the language is now enforcing a convention (and one that may not always be "correct" in this context). It also only enforces it in this very narrow use case. Both of these points make it feel more "bolted-on" than the other options.

There are also cases that aren't obvious to me:

It's also worth considering how easy it is to correct unintentional stores when they're found. I've recently added a syntax warning for some trivial cases (prompted by a recent mailing list discussion, and not pushed yet):

>>> match 42:
...     case foo: pass
...     case bar: pass
...     case _: pass
... 
<stdin>:2: SyntaxWarning: unguarded name capture pattern makes remaining cases unreachable; did you forget a leading dot?

This simple action of adding a . in one place becomes more complicated for some of the alternatives:

I'll need to think about this more. Right now I pretty strongly prefer "PEP" and "purist", and pretty strongly dislike "pragmatic" and "compromise" ("pragmatic" and "purist" are pretty loaded names when discussing Python language design, by the way... :wink:).

Strong strong strong dislike of `load`, though, because they look like strings and are a total pain to discuss/document in markup environments (this sentence alone has 11 ` characters in it).

brandtbucher commented 4 years ago

Either way, I think it's important to constantly emphasize (especially when discussing name patterns) that we are creating this feature specifically for destructuring, not switching. That should help reduce pushback from people who want to adorn stores rather than loads (or feel that rules like "purist" aren't powerful enough).

natelust commented 4 years ago

@gvanrossum I don't believe it does. To my knowledge in rust you either must use a match guard, or their binding operator in cases like these (I am not a rust expert though). The binding operator does a loads, compares, then stores, an example can be found here (it shows matching a pattern, but it may be a variable as well in a limited sense). In python (using their same at symbol) that might be spelled

number_of_doors=4
match random_new_car()
    case Car(color=color, doors=doors@number_of_doors):
        print(f"This is a {color} car guaranteed to have {doors} doors")

where I the variable is stored in doors, and I guess could be left as _ in the case you only wanted a constraint.

Edit: Fixed a typo. An I want to highlight that I think syntax like this has been discussed and was not favored, I only wanted to compare to what is in rust.

Tobias-Kohn commented 4 years ago

Big +1 from me for @brandtbucher pointing out the difficulties with the uppercase rule! I hadn't though of that, but I think these two issues (leading underscores and non-latin names) are quite valid. Of course, a firm rule will answer these questions, but it shows nonetheless that it might not be quite as straight-forward as at least I had thought.

I also like the SyntaxWarning! Very nice indeed!

While I am certainly not too eager to go for the backticks rule, I am not entirely sure whether the use of the language in markdown can be a strong concern. After all, it would be intended as a rather 'obscure' feature to be used sparingly.

brandtbucher commented 4 years ago

I also like the SyntaxWarning! Very nice indeed!

I knew you would like that.

I am not entirely sure whether the use of the language in markdown can be a strong concern.

Alright, you're in charge of writing the RST docs if we go this route. :wink:

dmoisset commented 4 years ago

Just to double check, is there anyone here that is still against default bind(store) semantics and prefers evaluate(load) ? I know I mentioned some misgivings at some point but I'm generally onboard with binding by default (I'm asking because of brandt's comment about «help reduce pushback from people who want to adorn stores rather than loads »)

Tobias-Kohn commented 4 years ago

The uppercase rule build on the convention and idea that constants are written in uppercase letters. In Scala (where the uppercase rule is applied), there is also the Java convention of writing all classes with an uppercase letter. This means that, e.g., the load semantics of Point in Point(x, y) is already established by the name Point itself.

In Python, we face several difficulties with this rule:

In favour of the uppercase rule, we find that it is quite simple, and solves the load/store problem without additional syntactic clutter. It thus has the potential to be a viable compromise between the two groups. On the other hand, having load and compare semantics for dotted names seems to cover enough cases as far as I am concerned.

gvanrossum commented 4 years ago

We seem to have agreement that dot-in-middle (a.b) is in and leading-dot (.b) is out. Also that stores don't need sigils, and that we'd rather not use sigils or other markers for loads.

Which leaves the choice: Do we use some form of the UPPERCASE rule or not? Let's have a vote among the authors.

If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)

brandtbucher commented 4 years ago

I vote no uppercase.

I still like the leading dot, actually... though I recognize that adding back it later is painless.

Tobias-Kohn commented 4 years ago

I also vote no uppercase.

However: as I understand the unicode standard seems to have an "UPPERCASE" flag for each character that specifies whether it is uppercase or not. Ignoring leading underscores also seems reasonable enough. But since my reservations are primarily on other aspects than whether we can determine if something is uppercase or not, I am still in favour of not implementing this rule.

viridia commented 4 years ago

Note that even with the purist approach, there are ways to match an unqualified names, using either guards or custom matchers. I recognize that using it this way is quite ugly and verbose - it might make sense if you had a match statement with a bunch of qualified names, and needed one special case for an unqualified name - but if you had a bunch of unqualified names the burden would be great enough as to force most people not to use a match statement at all.

gvanrossum commented 4 years ago

@viridia Could you vote? If you're against UPPER it's decided. If you're for we'll have to ask Ivan.

Regarding uppercase for non-Latin alphabets, Unicode has several letter categories, and "Lo" (Letter, other) is neither lowercase nor uppercase -- and there are 127,004 of those. I'm not sure but it looks like that includes (almost) all CJK "letters".

If we're still interested after the vote I can ask around.

viridia commented 4 years ago

I am -0.5 on using uppercase.

One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.

Alternatively, the rule could be "Conforms to the Python code formatting standard for an enumeration constant", in which cause I would be +0.

dmoisset commented 4 years ago

One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.

That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.

jimbaker commented 4 years ago

Agreed. There are many available disambiguating characters available. I'm personally in favor of ^ as a prefix, which would cause no ambiguity.

On Thu, Jul 2, 2020, 7:05 PM Daniel F Moisset notifications@github.com wrote:

One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.

That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.

gvanrossum commented 4 years ago

We could keep the door ajar for some variant of the uppercase rule by stipulating that capture variable names shouldn't start with a capital letter (after stripping leading underscores).

This shouldn't affect users whose alphabets has no case distinction (though a future decision to use a leading uppercase letter to mark dot-free loads would). It also shouldn't affect any serious use of match/case -- PEP 8 is quite clear that locals should use all-lowercase, and while I've seen plenty of code that violates the recommendation of using UPPERCASE for named constants or CapWords for class names, I don't think I've seen much code using anything but lowercase for local variables except on a whim. I doubt that anyone would even notice if we snuck this into the implementation without telling anybody. :-)

thautwarm commented 4 years ago

Glad to see the authors voting for no uppercase.

Many including me believe using uppercases this way is bad for Python.

If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)

Of course.

It seems that "no uppercase" will be adopted, and maybe no need to consider alphabets without lowercase/uppercase concepts.

In case you still some information about CJK languages, usually a CJK language user will expect all characters to be either uppercase or lowercase(because this is a routine way). However, instead of using cases, CJK language users might prefer using 「名字」 as the load of variable 名字.

gvanrossum commented 4 years ago

Looks like we're going with just the dot-in-the-middle rule ("purist"). People can create a dummy class and put values in there:

def foo(pt, context):
    class c:
        ctx = context
    match pt:
        case Point(0, 0, context=c.ctx): ...
dmoisset commented 4 years ago

I imagine most people will not use the dummy class and instead go for:

def foo(pt, context):
    match pt:
        case Point(0, 0, context=c) if c==context: ...

And it's what I've seen in Haskell and Rust (see for example Listing 18-27 here)

brandtbucher commented 4 years ago

This has been implemented. Still needs PEP though.

gvanrossum commented 4 years ago

I imagine most people will not use the dummy class and instead go for

def foo(pt, context):
    match pt:
        case Point(0, 0, context=c) if c==context: ...

I'm not so sure. That idiom looks backwards: We want to compare a value, but instead we extract it and then add a guard -- but for the human reader, a guard is much more expensive to understand, because there are many other things you could test for in a guard. Plus now the reader is wondering, is c only used in the guard, or also in the block (the ... in your example), or in the code past the end of the whole match statement.

Also it would be repetitive if several case clauses need it.

adamwehmann commented 3 years ago

With the adoption of the dot-in-the-middle rule, it seems that the use of "." as a sigil, if one is ever needed in the future, could have a little extra weight to it by analogy. Surprisingly I haven't seen* it explicitly stated or discussed anywhere, although it seems likely it was the intention, that the two combined rules would have compared nicely with the traversal rules of relative imports (and the current working directory concept on the Linux file system). Essentially, in this context, if I understand correctly, as foo.bar is matching the constant bar loaded from the foo namespace, .bar could be said to be matching the constant loaded from the current one. Realizing this while reflecting on the PEP updates fulfilled a missing motivator that took the original PEP on constant patterns from arbitrary to making more sense to me personally, so I don't know if others readers might have missed this connection as well, not that other reasons for not maintaining it in the PEP don't exist, of course.

Apologies if this is obvious and/or unwanted noise.

*searching this repo, python-dev, python-ideas, the PEP versions