Revisit load vs. store - Githubissues

gvanrossum commented 4 years ago

A bunch of folks on python-dev brought up that it's confusing to have case foo: mean "assign to foo" and have case .foo: to alter the meaning to "compare to the value of foo".

I think we're going to need another round of this discussion.

viridia commented 4 years ago

For marking loads, I don't really have a preference.

I know that backticks were mentioned earlier, but I don't know if anyone specifically floated the idea of using backticks to indicate a load (as opposed to a store):

  case Colors.BLACK:
    # stuff
  case `BLACK`:
    # different stuff

Which makes BLACK kind of look like a literal if you squint.

This may be opening up a can of worms, but technically you could put any arbitrary expression within the backticks and the pattern machinery would treat it as a constant.

  case `Colors["BLACK"]`:
    # different stuff

gvanrossum commented 4 years ago

Tobias wrote about this. Scala allows backticks for loads, but apparently they aren't popular in patterns. (It also has UPPER and dot-in-middle.)

I was hoping to reserve backticks for tagged template strings like in JS.

Tobias-Kohn commented 4 years ago

@viridia The backticks were on the table somewhere, as this is exactly what, e.g., Scala does. The potential additional benefit that we could get out of backticks is that we could also mark names in other places. As I mentioned before somewhere: Jython (which is still based on Python 2) has the problem that print is keyword but also a common method name in the Java libraries to be invoked. Hence, writing foo.'print' could potentially solve this issue. On the other hand, Guido meant that we should be careful with using the backticks too lightly for such a rather narrow use case, and I tend to quite agree with him.

The problem with Colors["BLACK"] is something that immediately comes up as soon as we have any kind of special marker, rather than a rule. Even if we used $, say, you could still argue that you might write $(Colors["BLACK"]).

@gvanrossum I would go either for the purist or the pragmatic rule. I like the leading dot rule a lot because it feels like a clever way to make a general rule applicable, but I also understand the general reservations about it. Python is mostly quite robust in that tiny changes to the code seldomly completely change the meaning of a program—I would regard this is one the core features for making Python such a good choice in education.

I am not entirely convinced that we need the uppercase rule, and I see two potential issues with it. First, there might be just as much resistance as in the case of the leading dot (sorry, I lost the thread on the dev-mailing list, so this might have been discussed there already). Second, it could be very tempting/inviting to just do something like:

def foo(x, Y):
    match x:
        case Y:
            ...

The more thought I have given it, the more I am convinced that we really should be careful with 'load and compare' semantics. Given all the issues and potential pitfalls that come with 'load and compare' semantics, I would certainly prefer the purist approach with only literals and dotted names (even the dotted names are already a compromise from the purist view :wink:). If it turns out that we need some load marker later on, we can still introduce it.

viridia commented 4 years ago

Good points.

There's one other idea I want to throw out there - again, I am just brainstorming here, don't expect me to put up a serious defense of any of these ideas - is to say, well, since the 'load and compare' is using equality to compare the name, then use the equality operator:

match x:
  case ==BLACK:

Yeah it looks ugly. But it avoids the problem with leading dot that people raised, which is that dot is such a small glyph that people might not notice it. This certainly doesn't suffer from that problem. And, maybe ugliness is a virtue - in the sense that we might want to gently discourage people from using unqualified names.

Tobias-Kohn commented 4 years ago

The idea with the equality operator becomes quite weird as soon as we put it into the context of attributes/arguments.

match x:
    case Circle(color===BLACK):
        ...

Maybe, we would have to make sure there is a space separating it properly, i.e. writing color= ==BLACK. But anyway, I don't think it would really work.

Besides that, if we are considering switch-statement semantics, you could write:

match x:
    case ==BLACK: 
        ...
    case ==RED: 
        ...

instead of

if x == BLACK: 
    ...
elif x == RED:
    ...

Hence, I think this is a perfect example to show that we do not need to cover the switch statement, because if-elif-else chains already cover that nicely :).

stereobutter commented 4 years ago

I hope it won't be held against me when I throw (yet another) new idea into the ring at this stage. We all agree that the major issue of the pep is that store and load are at odds with each other, yet there are compelling arguments for having both of them. Seeing the latest post from @viridia led me to think about this whole affair from the perspective of operators vs. keywords again. Could we not solve the issue of differentiating store from load with a keyword instead of an operator-like symbol (e.g. x?, x!, $x, &x) or an adornment ( `x`, <x>))? To be honest none of them feel quite pythonic (some of them even look a bit like Perl if you squint 🤢). So how could this look like with keywords instead:

HENRYS_FAV_COLOR = 'black'

match random_new_car():
    case Car(color=HENRYS_FAV_COLOR):  # load syntax
        print('Henry Ford approves')
    as Car(color=color):  # store syntax
        print(f'Ugh, not another {color} car')

Pros:

the same clutter free syntax can be used for both load and store
both FP-zealots and switch-afficionados get their shiny new toy and both use cases are clearly separated (yet can still be composed; see the next bullet point). This basically doubles the number of people who will want this pep to get accepted instead of fighting in the trenches for their personal favorite.
the syntax maps well to problems where there are a few special cases one likes to compare against and one (or more general cases) that have to be matched against a pattern. If there we no load syntax altogether everything would either need to be spelled as a guard or using a combination of match and if ... elif ...else.
we could allow arbitrary expressions for load semantics e.g. case Car(color=people['Henry'].favourite_color()) because there is no weird symbol or adornment getting in the way.
avoids introducing special meaning to things like capitalization e.g. treating FOO or Foo with load semantics compared with foo

Neutral (either Pro or Con, depending on your take on the issue)

load and store semantics is defined for the whole arm/branch and not on the level of individual variables. In my personal opinion this make the statement much clearer to read (and explain to beginners). On the other hand, use cases like the one suggested by @natelust, where one wants to use both store and load in one arm/branch are not possible. In those cases one would have to use store semantics with an appropriate guard
```
match obj:
    case Point(x, y, z, reference_frame=.reference_frame):
        return (x**2+y**2+z**2)**0.5
```
Cons:
yet another keyword (or an existing keyword used in a novel way)
there is the question of what keywords to use. Looking at the example above where I used case for load and as for store I feel that it would be aesthetically pleasing if both keywords had the same number of letters so that the code after the keywords would align vertically; maybe case and with would fit that bill?. OTOH both keywords should also be the same type of word; maybe case for load and pattern for store?

Edit:

A syntax variant with keywords of same length

match random_new_car():
    case Car(color=HENRYS_FAV_COLOR): 
        print('Henry Ford approves')
    with Car(color=color): 
        print(f'Ugh, not another {color} car')

Tobias-Kohn commented 4 years ago

If I try to summarise @SaschaSchlemmer's suggestion, there are three levels at which the switch/load and the match/store semantics could potentially coexist:

Coarse grained: by having both a switch and a match statement;
Sascha's middle path: by having two different types of case clauses;
Fine grained: by controlling the semantics of individual tokens.

The new idea is this middle path, where differentiate between load and store semantics on a clause level. @SaschaSchlemmer: please correct me if I am wrong.

Of these possibilities, the first one is clearly beyond the scope of this PEP since a switch statement could be introduced completely independent of our work here (whether a switch statement makes sense is a different story, of course). The second one now marries the two concepts just enough to consolidate them under a common umbrella: the top match statement.

I am highly sceptical as to whether this approach could work, as I see a couple of issues here. In short: I think it ends up being an overly complex solution that does not fully solve the problem. We'd pay a high price for little gain.

The overall direction of this PEP is to chip away anything we do not really need and concentrate on the absolute core set of features needed to have some basic, yet usable and versatile pattern matching. In contrast, the idea with two different case clause types follows a "let's have it all" approach—an approach with a historically very poor success rate I am afraid, and that rather goes against the Zen of Python.
As the suggestion also points out: it does not really solve the issue brought up by Nate. In fact, I wonder if there are any real use cases where I want to mix "switch" case clauses with "match" case clauses. As it clearly does not solve the fine grained use cases, and if there are no compelling examples why case and as/with should be mixed within the same statement, this basically falls back to the first, coarse grained, case. Hence, we better clearly separate the two completely (theoretically, a "switch-PEP" could still propose to reuse match as the top-keyword).

stereobutter commented 4 years ago

@Tobias-Kohn You summarized my train of though pretty well (although I didn't explicitly think about the 1. level since this use case is already covered pretty well by if ... elif ...else). It seemed to me that cramming all the features (store and load) together (3. level) makes match quite powerful but maybe just too hard to explain (and thus sell the pep).

for the example @natelust brought forward (mixing load and store semantics in one arm/branch), I really think using a guard is the one obvious way to do it and argue that we should explicitly reject special syntax for this use case.
for the simpler use case of load and compare I am not so sure; people in this thread and elsewhere appear to like/expect the feature. Let me quote:

@gvanrossum

The main use case is definitely FP style structure unpacking. An early proposal didn’t even have constant value patterns. But named constants and enums are very much part of Python’s culture and we felt we had to support them.

Allowing case Car(color=HENRYS_FAV_COLOR) will replace a lot of if isinstance(some_car, Car) and car.color == HENRYS_FAV_COLOR. Notice here that this use case is not equivalent to a c-style switch statement (that should just be written using if FOO ... elif BAR ... else ... but also features the same structure unpacking. There is also precedent in other languages that have similar features (but albeit with some not so nice syntactical choices). Marrying both concepts at the 2. level felt like a good solution for the common cases to me.
having two keywords attached to match also is a two-way door in that we could introduce match with store semantics now and have (extended) load semantics with the second keyword later in another pep. This would also strip the current pep of the exception made for dotted names etc. and reduce match with the store-clause to binding variables and comparison with literal values.

Tobias-Kohn commented 4 years ago

@SaschaSchlemmer It seems that we are generally quite in agreement with only minor points where we differ. And I would certainly be interested if you know of a compelling example that shows that the two case clauses really make sense.

having two keywords attached to match also is a two-way door [...]

While I totally agree with you that this could (and probably should) be addressed in another PEP, this really is a one-way door. Once we introduce two different case clauses, there is no going back. We should therefore not do something like this lightly.

stereobutter commented 4 years ago

@Tobias-Kohn I meant that the two-way door is that we could introduce match and a case clause with store semantics now and decide later whether we'd like load semantics at the token level or via another clause. No need for deciding for/against the second case clause now (except maybe for some foresight in naming the case clause proposed in this pep)

gvanrossum commented 4 years ago

I find it unacceptable that load vs. store applies to the whole clause. Also when I first read Sascha’s first example I didn’t understand it because I didn’t notice the ‘as’.

Maybe we could debate the UPPERCASE rule, and decide first two preferences. In Scala the UPPER rule seems to work well. Does Rust have it?

stereobutter commented 4 years ago

@Tobias-Kohn to be honest I have not seen a convincing example where I'd prefer load semantics (except for literal values) over using store semantics and an appropriate guard.

brandtbucher commented 4 years ago

I still dislike the uppercase rule, because the language is now enforcing a convention (and one that may not always be "correct" in this context). It also only enforces it in this very narrow use case. Both of these points make it feel more "bolted-on" than the other options.

There are also cases that aren't obvious to me:

What about names that start with underscores? I often have private module-level names like _PATH or _START_DATE. Do we just skip over all underscores first? I believe Scala treats them as lowercase.
More generally, what about names using characters with no concept of upper- or lower-case?

It's also worth considering how easy it is to correct unintentional stores when they're found. I've recently added a syntax warning for some trivial cases (prompted by a recent mailing list discussion, and not pushed yet):

>>> match 42:
...     case foo: pass
...     case bar: pass
...     case _: pass
... 
<stdin>:2: SyntaxWarning: unguarded name capture pattern makes remaining cases unreachable; did you forget a leading dot?

This simple action of adding a . in one place becomes more complicated for some of the alternatives:

pragmatic: "... consider renaming to UPPERCASE?"
purist: "... consider refactoring to use a qualified (dotted) name?"

I'll need to think about this more. Right now I pretty strongly prefer "PEP" and "purist", and pretty strongly dislike "pragmatic" and "compromise" ("pragmatic" and "purist" are pretty loaded names when discussing Python language design, by the way... :wink:).

Strong strong strong dislike of `load`, though, because they look like strings and are a total pain to discuss/document in markup environments (this sentence alone has 11 ` characters in it).

brandtbucher commented 4 years ago

Either way, I think it's important to constantly emphasize (especially when discussing name patterns) that we are creating this feature specifically for destructuring, not switching. That should help reduce pushback from people who want to adorn stores rather than loads (or feel that rules like "purist" aren't powerful enough).

natelust commented 4 years ago

@gvanrossum I don't believe it does. To my knowledge in rust you either must use a match guard, or their binding operator in cases like these (I am not a rust expert though). The binding operator does a loads, compares, then stores, an example can be found here (it shows matching a pattern, but it may be a variable as well in a limited sense). In python (using their same at symbol) that might be spelled

number_of_doors=4
match random_new_car()
    case Car(color=color, doors=doors@number_of_doors):
        print(f"This is a {color} car guaranteed to have {doors} doors")

where I the variable is stored in doors, and I guess could be left as _ in the case you only wanted a constraint.

Edit: Fixed a typo. An I want to highlight that I think syntax like this has been discussed and was not favored, I only wanted to compare to what is in rust.

Tobias-Kohn commented 4 years ago

Big +1 from me for @brandtbucher pointing out the difficulties with the uppercase rule! I hadn't though of that, but I think these two issues (leading underscores and non-latin names) are quite valid. Of course, a firm rule will answer these questions, but it shows nonetheless that it might not be quite as straight-forward as at least I had thought.

I also like the SyntaxWarning! Very nice indeed!

While I am certainly not too eager to go for the backticks rule, I am not entirely sure whether the use of the language in markdown can be a strong concern. After all, it would be intended as a rather 'obscure' feature to be used sparingly.

brandtbucher commented 4 years ago

I also like the SyntaxWarning! Very nice indeed!

I knew you would like that.

I am not entirely sure whether the use of the language in markdown can be a strong concern.

Alright, you're in charge of writing the RST docs if we go this route. :wink:

dmoisset commented 4 years ago

Just to double check, is there anyone here that is still against default bind(store) semantics and prefers evaluate(load) ? I know I mentioned some misgivings at some point but I'm generally onboard with binding by default (I'm asking because of brandt's comment about «help reduce pushback from people who want to adorn stores rather than loads »)

Tobias-Kohn commented 4 years ago

The uppercase rule build on the convention and idea that constants are written in uppercase letters. In Scala (where the uppercase rule is applied), there is also the Java convention of writing all classes with an uppercase letter. This means that, e.g., the load semantics of Point in Point(x, y) is already established by the name Point itself.

In Python, we face several difficulties with this rule:

Many classes and types are written entirely in lowercase. The load semantics of class names must therefore be established by the following parenthesis rather than the name itself.
There is no such thing as a constant in the strict sense, nor is there anything that would prevent local variables and parameters to use uppercase letters. Since I would rather not have too much 'load and compare' semantics in the patterns, I see the doors wide open for misuse (but there might be differing opinions on that).
As Brandt has just pointed out: there are many possible names that are neither clearly uppercase, nor clearly not.
There is quite some resistance from people who are uncomfortable with the introduction of such a rule that has no precedent in Python so far.

In favour of the uppercase rule, we find that it is quite simple, and solves the load/store problem without additional syntactic clutter. It thus has the potential to be a viable compromise between the two groups. On the other hand, having load and compare semantics for dotted names seems to cover enough cases as far as I am concerned.

gvanrossum commented 4 years ago

We seem to have agreement that dot-in-middle (a.b) is in and leading-dot (.b) is out. Also that stores don't need sigils, and that we'd rather not use sigils or other markers for loads.

Which leaves the choice: Do we use some form of the UPPERCASE rule or not? Let's have a vote among the authors.

If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)

brandtbucher commented 4 years ago

I vote no uppercase.

I still like the leading dot, actually... though I recognize that adding back it later is painless.

Tobias-Kohn commented 4 years ago

I also vote no uppercase.

However: as I understand the unicode standard seems to have an "UPPERCASE" flag for each character that specifies whether it is uppercase or not. Ignoring leading underscores also seems reasonable enough. But since my reservations are primarily on other aspects than whether we can determine if something is uppercase or not, I am still in favour of not implementing this rule.

viridia commented 4 years ago

Note that even with the purist approach, there are ways to match an unqualified names, using either guards or custom matchers. I recognize that using it this way is quite ugly and verbose - it might make sense if you had a match statement with a bunch of qualified names, and needed one special case for an unqualified name - but if you had a bunch of unqualified names the burden would be great enough as to force most people not to use a match statement at all.

gvanrossum commented 4 years ago

@viridia Could you vote? If you're against UPPER it's decided. If you're for we'll have to ask Ivan.

Regarding uppercase for non-Latin alphabets, Unicode has several letter categories, and "Lo" (Letter, other) is neither lowercase nor uppercase -- and there are 127,004 of those. I'm not sure but it looks like that includes (almost) all CJK "letters".

If we're still interested after the vote I can ask around.

viridia commented 4 years ago

I am -0.5 on using uppercase.

One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.

Alternatively, the rule could be "Conforms to the Python code formatting standard for an enumeration constant", in which cause I would be +0.

dmoisset commented 4 years ago

One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.

That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.

jimbaker commented 4 years ago

Agreed. There are many available disambiguating characters available. I'm personally in favor of ^ as a prefix, which would cause no ambiguity.

On Thu, Jul 2, 2020, 7:05 PM Daniel F Moisset notifications@github.com wrote:

One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.

That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.

gvanrossum commented 4 years ago

We could keep the door ajar for some variant of the uppercase rule by stipulating that capture variable names shouldn't start with a capital letter (after stripping leading underscores).

This shouldn't affect users whose alphabets has no case distinction (though a future decision to use a leading uppercase letter to mark dot-free loads would). It also shouldn't affect any serious use of match/case -- PEP 8 is quite clear that locals should use all-lowercase, and while I've seen plenty of code that violates the recommendation of using UPPERCASE for named constants or CapWords for class names, I don't think I've seen much code using anything but lowercase for local variables except on a whim. I doubt that anyone would even notice if we snuck this into the implementation without telling anybody. :-)

thautwarm commented 4 years ago

Glad to see the authors voting for no uppercase.

Many including me believe using uppercases this way is bad for Python.

If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)

Of course.

It seems that "no uppercase" will be adopted, and maybe no need to consider alphabets without lowercase/uppercase concepts.

In case you still some information about CJK languages, usually a CJK language user will expect all characters to be either uppercase or lowercase(because this is a routine way). However, instead of using cases, CJK language users might prefer using 「名字」 as the load of variable 名字.

gvanrossum commented 4 years ago

Looks like we're going with just the dot-in-the-middle rule ("purist"). People can create a dummy class and put values in there:

def foo(pt, context):
    class c:
        ctx = context
    match pt:
        case Point(0, 0, context=c.ctx): ...

dmoisset commented 4 years ago

I imagine most people will not use the dummy class and instead go for:

def foo(pt, context):
    match pt:
        case Point(0, 0, context=c) if c==context: ...

And it's what I've seen in Haskell and Rust (see for example Listing 18-27 here)

brandtbucher commented 4 years ago

This has been implemented. Still needs PEP though.

gvanrossum commented 4 years ago

I imagine most people will not use the dummy class and instead go for
def foo(pt, context):
    match pt:
        case Point(0, 0, context=c) if c==context: ...

I'm not so sure. That idiom looks backwards: We want to compare a value, but instead we extract it and then add a guard -- but for the human reader, a guard is much more expensive to understand, because there are many other things you could test for in a guard. Plus now the reader is wondering, is c only used in the guard, or also in the block (the ... in your example), or in the code past the end of the whole match statement.

Also it would be repetitive if several case clauses need it.

adamwehmann commented 3 years ago

With the adoption of the dot-in-the-middle rule, it seems that the use of "." as a sigil, if one is ever needed in the future, could have a little extra weight to it by analogy. Surprisingly I haven't seen* it explicitly stated or discussed anywhere, although it seems likely it was the intention, that the two combined rules would have compared nicely with the traversal rules of relative imports (and the current working directory concept on the Linux file system). Essentially, in this context, if I understand correctly, as foo.bar is matching the constant bar loaded from the foo namespace, .bar could be said to be matching the constant loaded from the current one. Realizing this while reflecting on the PEP updates fulfilled a missing motivator that took the original PEP on constant patterns from arbitrary to making more sense to me personally, so I don't know if others readers might have missed this connection as well, not that other reasons for not maintaining it in the PEP don't exist, of course.

Apologies if this is obvious and/or unwanted noise.

*searching this repo, python-dev, python-ideas, the PEP versions

gvanrossum / patma

Revisit load vs. store #90

Pros:

Neutral (either Pro or Con, depending on your take on the issue)

Cons:

Edit: