Open gvanrossum opened 4 years ago
For marking loads, I don't really have a preference.
I know that backticks were mentioned earlier, but I don't know if anyone specifically floated the idea of using backticks to indicate a load (as opposed to a store):
case Colors.BLACK:
# stuff
case `BLACK`:
# different stuff
Which makes BLACK kind of look like a literal if you squint.
This may be opening up a can of worms, but technically you could put any arbitrary expression within the backticks and the pattern machinery would treat it as a constant.
case `Colors["BLACK"]`:
# different stuff
Tobias wrote about this. Scala allows backticks for loads, but apparently they aren't popular in patterns. (It also has UPPER and dot-in-middle.)
I was hoping to reserve backticks for tagged template strings like in JS.
@viridia The backticks were on the table somewhere, as this is exactly what, e.g., Scala does. The potential additional benefit that we could get out of backticks is that we could also mark names in other places. As I mentioned before somewhere: Jython (which is still based on Python 2) has the problem that print
is keyword but also a common method name in the Java libraries to be invoked. Hence, writing foo.'print'
could potentially solve this issue. On the other hand, Guido meant that we should be careful with using the backticks too lightly for such a rather narrow use case, and I tend to quite agree with him.
The problem with Colors["BLACK"]
is something that immediately comes up as soon as we have any kind of special marker, rather than a rule. Even if we used $
, say, you could still argue that you might write $(Colors["BLACK"])
.
@gvanrossum I would go either for the purist or the pragmatic rule. I like the leading dot rule a lot because it feels like a clever way to make a general rule applicable, but I also understand the general reservations about it. Python is mostly quite robust in that tiny changes to the code seldomly completely change the meaning of a program—I would regard this is one the core features for making Python such a good choice in education.
I am not entirely convinced that we need the uppercase rule, and I see two potential issues with it. First, there might be just as much resistance as in the case of the leading dot (sorry, I lost the thread on the dev-mailing list, so this might have been discussed there already). Second, it could be very tempting/inviting to just do something like:
def foo(x, Y):
match x:
case Y:
...
The more thought I have given it, the more I am convinced that we really should be careful with 'load and compare' semantics. Given all the issues and potential pitfalls that come with 'load and compare' semantics, I would certainly prefer the purist approach with only literals and dotted names (even the dotted names are already a compromise from the purist view :wink:). If it turns out that we need some load marker later on, we can still introduce it.
Good points.
There's one other idea I want to throw out there - again, I am just brainstorming here, don't expect me to put up a serious defense of any of these ideas - is to say, well, since the 'load and compare' is using equality to compare the name, then use the equality operator:
match x:
case ==BLACK:
Yeah it looks ugly. But it avoids the problem with leading dot that people raised, which is that dot is such a small glyph that people might not notice it. This certainly doesn't suffer from that problem. And, maybe ugliness is a virtue - in the sense that we might want to gently discourage people from using unqualified names.
The idea with the equality operator becomes quite weird as soon as we put it into the context of attributes/arguments.
match x:
case Circle(color===BLACK):
...
Maybe, we would have to make sure there is a space separating it properly, i.e. writing color= ==BLACK
. But anyway, I don't think it would really work.
Besides that, if we are considering switch-statement semantics, you could write:
match x:
case ==BLACK:
...
case ==RED:
...
instead of
if x == BLACK:
...
elif x == RED:
...
Hence, I think this is a perfect example to show that we do not need to cover the switch statement, because if
-elif
-else
chains already cover that nicely :).
I hope it won't be held against me when I throw (yet another) new idea into the ring at this stage. We all agree that the major issue of the pep is that store and load are at odds with each other, yet there are compelling arguments for having both of them. Seeing the latest post from @viridia led me to think about this whole affair from the perspective of operators vs. keywords again. Could we not solve the issue of differentiating store from load with a keyword instead of an operator-like symbol (e.g. x?
, x!
, $x
, &x
) or an adornment ( `x`
, <x>
))? To be honest none of them feel quite pythonic (some of them even look a bit like Perl if you squint 🤢). So how could this look like with keywords instead:
HENRYS_FAV_COLOR = 'black'
match random_new_car():
case Car(color=HENRYS_FAV_COLOR): # load syntax
print('Henry Ford approves')
as Car(color=color): # store syntax
print(f'Ugh, not another {color} car')
match
and if ... elif ...else
.case Car(color=people['Henry'].favourite_color())
because there is no weird symbol or adornment getting in the way.FOO
or Foo
with load semantics compared with foo
match obj:
case Point(x, y, z, reference_frame=.reference_frame):
return (x**2+y**2+z**2)**0.5
case
for load and as
for store I feel that it would be aesthetically pleasing if both keywords had the same number of letters so that the code after the keywords would align vertically; maybe case
and with
would fit that bill?. OTOH both keywords should also be the same type of word; maybe case
for load and pattern
for store?A syntax variant with keywords of same length
match random_new_car():
case Car(color=HENRYS_FAV_COLOR):
print('Henry Ford approves')
with Car(color=color):
print(f'Ugh, not another {color} car')
If I try to summarise @SaschaSchlemmer's suggestion, there are three levels at which the switch
/load and the match
/store semantics could potentially coexist:
switch
and a match
statement;case
clauses;The new idea is this middle path, where differentiate between load and store semantics on a clause level. @SaschaSchlemmer: please correct me if I am wrong.
Of these possibilities, the first one is clearly beyond the scope of this PEP since a switch
statement could be introduced completely independent of our work here (whether a switch
statement makes sense is a different story, of course). The second one now marries the two concepts just enough to consolidate them under a common umbrella: the top match
statement.
I am highly sceptical as to whether this approach could work, as I see a couple of issues here. In short: I think it ends up being an overly complex solution that does not fully solve the problem. We'd pay a high price for little gain.
The overall direction of this PEP is to chip away anything we do not really need and concentrate on the absolute core set of features needed to have some basic, yet usable and versatile pattern matching. In contrast, the idea with two different case clause types follows a "let's have it all" approach—an approach with a historically very poor success rate I am afraid, and that rather goes against the Zen of Python.
As the suggestion also points out: it does not really solve the issue brought up by Nate. In fact, I wonder if there are any real use cases where I want to mix "switch" case clauses with "match" case clauses. As it clearly does not solve the fine grained use cases, and if there are no compelling examples why case
and as
/with
should be mixed within the same statement, this basically falls back to the first, coarse grained, case. Hence, we better clearly separate the two completely (theoretically, a "switch-PEP" could still propose to reuse match
as the top-keyword).
@Tobias-Kohn
You summarized my train of though pretty well (although I didn't explicitly think about the 1. level since this use case is already covered pretty well by if ... elif ...else
). It seemed to me that cramming all the features (store and load) together (3. level) makes match
quite powerful but maybe just too hard to explain (and thus sell the pep).
for the example @natelust brought forward (mixing load and store semantics in one arm/branch), I really think using a guard is the one obvious way to do it and argue that we should explicitly reject special syntax for this use case.
for the simpler use case of load and compare I am not so sure; people in this thread and elsewhere appear to like/expect the feature. Let me quote:
@gvanrossum
The main use case is definitely FP style structure unpacking. An early proposal didn’t even have constant value patterns. But named constants and enums are very much part of Python’s culture and we felt we had to support them.
Allowing case Car(color=HENRYS_FAV_COLOR)
will replace a lot of if isinstance(some_car, Car) and car.color == HENRYS_FAV_COLOR
. Notice here that this use case is not equivalent to a c-style switch statement (that should just be written using if FOO ... elif BAR ... else ...
but also features the same structure unpacking. There is also precedent in other languages that have similar features (but albeit with some not so nice syntactical choices). Marrying both concepts at the 2. level felt like a good solution for the common cases to me.
having two keywords attached to match
also is a two-way door in that we could introduce match
with store semantics now and have (extended) load semantics with the second keyword later in another pep. This would also strip the current pep of the exception made for dotted names etc. and reduce match
with the store-clause to binding variables and comparison with literal values.
@SaschaSchlemmer It seems that we are generally quite in agreement with only minor points where we differ. And I would certainly be interested if you know of a compelling example that shows that the two case clauses really make sense.
having two keywords attached to match also is a two-way door [...]
While I totally agree with you that this could (and probably should) be addressed in another PEP, this really is a one-way door. Once we introduce two different case clauses, there is no going back. We should therefore not do something like this lightly.
@Tobias-Kohn I meant that the two-way door is that we could introduce match and a case clause with store semantics now and decide later whether we'd like load semantics at the token level or via another clause. No need for deciding for/against the second case clause now (except maybe for some foresight in naming the case clause proposed in this pep)
I find it unacceptable that load vs. store applies to the whole clause. Also when I first read Sascha’s first example I didn’t understand it because I didn’t notice the ‘as’.
Maybe we could debate the UPPERCASE rule, and decide first two preferences. In Scala the UPPER rule seems to work well. Does Rust have it?
@Tobias-Kohn to be honest I have not seen a convincing example where I'd prefer load semantics (except for literal values) over using store semantics and an appropriate guard.
I still dislike the uppercase rule, because the language is now enforcing a convention (and one that may not always be "correct" in this context). It also only enforces it in this very narrow use case. Both of these points make it feel more "bolted-on" than the other options.
There are also cases that aren't obvious to me:
_PATH
or _START_DATE
. Do we just skip over all underscores first? I believe Scala treats them as lowercase.It's also worth considering how easy it is to correct unintentional stores when they're found. I've recently added a syntax warning for some trivial cases (prompted by a recent mailing list discussion, and not pushed yet):
>>> match 42:
... case foo: pass
... case bar: pass
... case _: pass
...
<stdin>:2: SyntaxWarning: unguarded name capture pattern makes remaining cases unreachable; did you forget a leading dot?
This simple action of adding a .
in one place becomes more complicated for some of the alternatives:
I'll need to think about this more. Right now I pretty strongly prefer "PEP" and "purist", and pretty strongly dislike "pragmatic" and "compromise" ("pragmatic" and "purist" are pretty loaded names when discussing Python language design, by the way... :wink:).
Strong strong strong dislike of `load`
, though, because they look like strings and are a total pain to discuss/document in markup environments (this sentence alone has 11 `
characters in it).
Either way, I think it's important to constantly emphasize (especially when discussing name patterns) that we are creating this feature specifically for destructuring, not switching. That should help reduce pushback from people who want to adorn stores rather than loads (or feel that rules like "purist" aren't powerful enough).
@gvanrossum I don't believe it does. To my knowledge in rust you either must use a match guard, or their binding operator in cases like these (I am not a rust expert though). The binding operator does a loads, compares, then stores, an example can be found here (it shows matching a pattern, but it may be a variable as well in a limited sense). In python (using their same at symbol) that might be spelled
number_of_doors=4
match random_new_car()
case Car(color=color, doors=doors@number_of_doors):
print(f"This is a {color} car guaranteed to have {doors} doors")
where I the variable is stored in doors, and I guess could be left as _
in the case you only wanted a constraint.
Edit: Fixed a typo. An I want to highlight that I think syntax like this has been discussed and was not favored, I only wanted to compare to what is in rust.
Big +1 from me for @brandtbucher pointing out the difficulties with the uppercase rule! I hadn't though of that, but I think these two issues (leading underscores and non-latin names) are quite valid. Of course, a firm rule will answer these questions, but it shows nonetheless that it might not be quite as straight-forward as at least I had thought.
I also like the SyntaxWarning! Very nice indeed!
While I am certainly not too eager to go for the backticks rule, I am not entirely sure whether the use of the language in markdown can be a strong concern. After all, it would be intended as a rather 'obscure' feature to be used sparingly.
I also like the SyntaxWarning! Very nice indeed!
I knew you would like that.
I am not entirely sure whether the use of the language in markdown can be a strong concern.
Alright, you're in charge of writing the RST docs if we go this route. :wink:
Just to double check, is there anyone here that is still against default bind(store) semantics and prefers evaluate(load) ? I know I mentioned some misgivings at some point but I'm generally onboard with binding by default (I'm asking because of brandt's comment about «help reduce pushback from people who want to adorn stores rather than loads »)
The uppercase rule build on the convention and idea that constants are written in uppercase letters. In Scala (where the uppercase rule is applied), there is also the Java convention of writing all classes with an uppercase letter. This means that, e.g., the load semantics of Point
in Point(x, y)
is already established by the name Point
itself.
In Python, we face several difficulties with this rule:
In favour of the uppercase rule, we find that it is quite simple, and solves the load/store problem without additional syntactic clutter. It thus has the potential to be a viable compromise between the two groups. On the other hand, having load and compare semantics for dotted names seems to cover enough cases as far as I am concerned.
We seem to have agreement that dot-in-middle (a.b
) is in and leading-dot (.b
) is out. Also that stores don't need sigils, and that we'd rather not use sigils or other markers for loads.
Which leaves the choice: Do we use some form of the UPPERCASE rule or not? Let's have a vote among the authors.
If it's accepted, I'd solve one minor issue by ruling that _Foo
and __Foo
are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)
I vote no uppercase.
I still like the leading dot, actually... though I recognize that adding back it later is painless.
I also vote no uppercase.
However: as I understand the unicode standard seems to have an "UPPERCASE" flag for each character that specifies whether it is uppercase or not. Ignoring leading underscores also seems reasonable enough. But since my reservations are primarily on other aspects than whether we can determine if something is uppercase or not, I am still in favour of not implementing this rule.
Note that even with the purist approach, there are ways to match an unqualified names, using either guards or custom matchers. I recognize that using it this way is quite ugly and verbose - it might make sense if you had a match statement with a bunch of qualified names, and needed one special case for an unqualified name - but if you had a bunch of unqualified names the burden would be great enough as to force most people not to use a match statement at all.
@viridia Could you vote? If you're against UPPER it's decided. If you're for we'll have to ask Ivan.
Regarding uppercase for non-Latin alphabets, Unicode has several letter categories, and "Lo" (Letter, other) is neither lowercase nor uppercase -- and there are 127,004 of those. I'm not sure but it looks like that includes (almost) all CJK "letters".
If we're still interested after the vote I can ask around.
I am -0.5 on using uppercase.
One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.
Alternatively, the rule could be "Conforms to the Python code formatting standard for an enumeration constant", in which cause I would be +0.
One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.
That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.
Agreed. There are many available disambiguating characters available. I'm personally in favor of ^ as a prefix, which would cause no ambiguity.
On Thu, Jul 2, 2020, 7:05 PM Daniel F Moisset notifications@github.com wrote:
One other variant I would propose is, rather than "Starts with a capital letter", instead "Contains no lowercase letters". I am not sure if that is better or whether it addresses the unicode issues.
That won't work; that would mean that variable names with no latin letters (which may be all variables for code written in some non-English languages) would default to load semantics rather than store.
We could keep the door ajar for some variant of the uppercase rule by stipulating that capture variable names shouldn't start with a capital letter (after stripping leading underscores).
This shouldn't affect users whose alphabets has no case distinction (though a future decision to use a leading uppercase letter to mark dot-free loads would). It also shouldn't affect any serious use of match/case -- PEP 8 is quite clear that locals should use all-lowercase, and while I've seen plenty of code that violates the recommendation of using UPPERCASE for named constants or CapWords for class names, I don't think I've seen much code using anything but lowercase for local variables except on a whim. I doubt that anyone would even notice if we snuck this into the implementation without telling anybody. :-)
Glad to see the authors voting for no uppercase.
Many including me believe using uppercases this way is bad for Python.
If it's accepted, I'd solve one minor issue by ruling that _Foo and __Foo are UPPERCASE. I don't know what to do for alphabets without lower/upper distinction but I don't see that as a show-stopper. (@thautwarm, can you help here?)
Of course.
It seems that "no uppercase" will be adopted, and maybe no need to consider alphabets without lowercase/uppercase concepts.
In case you still some information about CJK languages, usually a CJK language user will expect all characters to be either uppercase or lowercase(because this is a routine way). However, instead of using cases, CJK language users might prefer using 「名字」
as the load of variable 名字
.
Looks like we're going with just the dot-in-the-middle rule ("purist"). People can create a dummy class and put values in there:
def foo(pt, context):
class c:
ctx = context
match pt:
case Point(0, 0, context=c.ctx): ...
I imagine most people will not use the dummy class and instead go for:
def foo(pt, context):
match pt:
case Point(0, 0, context=c) if c==context: ...
And it's what I've seen in Haskell and Rust (see for example Listing 18-27 here)
This has been implemented. Still needs PEP though.
I imagine most people will not use the dummy class and instead go for
def foo(pt, context): match pt: case Point(0, 0, context=c) if c==context: ...
I'm not so sure. That idiom looks backwards: We want to compare a value, but instead we extract it and then add a guard -- but for the human reader, a guard is much more expensive to understand, because there are many other things you could test for in a guard. Plus now the reader is wondering, is c
only used in the guard, or also in the block (the ...
in your example), or in the code past the end of the whole match
statement.
Also it would be repetitive if several case clauses need it.
With the adoption of the dot-in-the-middle rule, it seems that the use of "." as a sigil, if one is ever needed in the future, could have a little extra weight to it by analogy. Surprisingly I haven't seen* it explicitly stated or discussed anywhere, although it seems likely it was the intention, that the two combined rules would have compared nicely with the traversal rules of relative imports (and the current working directory concept on the Linux file system). Essentially, in this context, if I understand correctly, as foo.bar is matching the constant bar loaded from the foo namespace, .bar could be said to be matching the constant loaded from the current one. Realizing this while reflecting on the PEP updates fulfilled a missing motivator that took the original PEP on constant patterns from arbitrary to making more sense to me personally, so I don't know if others readers might have missed this connection as well, not that other reasons for not maintaining it in the PEP don't exist, of course.
Apologies if this is obvious and/or unwanted noise.
*searching this repo, python-dev, python-ideas, the PEP versions
A bunch of folks on python-dev brought up that it's confusing to have
case foo:
mean "assign to foo" and havecase .foo:
to alter the meaning to "compare to the value of foo".I think we're going to need another round of this discussion.