gvanrossum / patma

Pattern Matching
1.03k stars 65 forks source link

Distinguishing Loads and Stores #1

Open brandtbucher opened 4 years ago

brandtbucher commented 4 years ago

@brandtbucher:

...one could write from (x): if there are no captures, or from (x) -> (y, z): to have every occurrence of y or z in a match be a value capture.

@gvanrossum:

This looks verbose to me, and is likely to cause surprises when a new case is added but the new extraction variable isn’t added to the header.

I agree it's a bit verbose, but it seems like it clears up a lot of headaches for both the user and the compiler. Example 5, rewritten in the prevailing keyword flavor:

match rtype as (a, b):
    case RUnion(items=[none_rprimitive, a]):
        return a
    case RUnion(items=[b, none_rprimitive]):
        return b
    else:
        return None

Otherwise, to me, it's not entirely clear where none_rprimitive, a, b, and maybe even RUnion are coming from / going to. Likely, this ambiguity would need to be resolved in the eval loop (like LOAD_NAME does), which is unfortunate. Here, everything is known at compile-time.

(This could open the door to capturing types like RUnion as well... but that's a separate discussion).

My opinion is that the headaches a construct like this saves vastly outnumber those due to mistakes like the one you mentioned, which a linter could probably catch in most cases.

gvanrossum commented 4 years ago

This is an issue that most languages with a match expression/statement have had to solve.

For example, in Scala:

A variable pattern 𝑥 is a simple identifier which starts with a lower case letter

It would be interesting to make a survey of what other languages do. IIRC most require the use of some special syntax to force the alternate interpretation.

brandtbucher commented 4 years ago

IIRC most require the use of some special syntax to force the alternate interpretation.

Yeah, it does elegantly solve both problems. What about simply adding a token to the name... like ?:

Other things we could consider, piggybacking off of this:

gvanrossum commented 4 years ago

I notice Rust uses something similar as Scala; their case Point { x, y } => ... extracts the point's x and y attributes into the corresponding variables. The variable names must match the object's attributes, but there's a way to override, e.g. Point { x: x1, y: y1 } => .... I couldn't quickly find if they have a way to use e.g. named constants, but they do match on enums using syntax like this: Message::Quit => ....

gvanrossum commented 4 years ago

In Elixir you can use the "pin" operator ^ to override the variable-binding behavior. In our syntax, that would mean that

case x: ...

binds a new variable x, while

case ^x: ...

matches the value of the existing variable x.

Tobias-Kohn commented 4 years ago

At the end of the day, it might not really be necessary to introduce any special syntax to distinguish between loads and stores here, anyway.

Based on the syntax currently used in our discussion, we might start by agreeing that anything that is "called" must be a class with a __match__ function (referring to the syntax proposed in issue #8). The convention I have then seen adopted most often so far is that basically every identifier captures a value (and is thus a Store). The argumentation behind it is that you could always use literal values for constant values, instead of a "constant" identifier. In case of truly user-defined types, however, we end up having a class, anyway, and that would be recognisable by the use of the parentheses afterwards.

If, in the example given above, none_rprimitive is a value that the pattern should be matched against, then there are three possibilities:

  1. none_rprimitive stands for a simple value like 42 and we could then replace it by 42 in the pattern. (incidentally, this works in Python 3 much better than in Python 2 because True and False are now keywords);
  2. none_rprimitive is a user-defined class, in which case we would write none_rprimitive() (note the parentheses) to invoke its __match__ method, anyway;
  3. none_rprimitive is not really a constant value or class in the strict sense, in which case the pattern should be expressed with a guard as, e.g., case x if x == none_rprimitive:.
Tobias-Kohn commented 4 years ago

A similar question to consider is what happens if a variable appears twice in a pattern. Think, for instance, of case Point(x, x):. Most probably, the programmer might want to express that both x and y coordinate must be equal. As Python does not have "read-only" variables, it would probably just end up assigning the value of y to the variable x.

In a function definition, def Point(x, x): is clearly an error, even if the programmer wants to express that both arguments should be identical (for whatever reason). In a similar vein, case Point(x, x): could also be an error. I think the compiler should be able to easily check that each capturing variable is assigned to only once in any given pattern.

At least in Scala, case Point(x, x): is illegal and would have to be written case Point(x, y) if x == y:.

gvanrossum commented 4 years ago

At the end of the day, it might not really be necessary to introduce any special syntax to distinguish between loads and stores here, anyway.

That's my POV, but I'd like to convince Brandt, and I do have some niggling doubts that I'll illustrate below.

Based on the syntax currently used in our discussion, we might start by agreeing that anything that is "called" must be a class with a __match__ function (referring to the syntax proposed in issue #8).

All agree on that.

The convention I have then seen adopted most often so far is that basically every identifier captures a value (and is thus a Store). The argumentation behind it is that you could always use literal values for constant values, instead of a "constant" identifier. In case of truly user-defined types, however, we end up having a class, anyway, and that would be recognisable by the use of the parentheses afterwards.

Here I'm not so sure. It is a common convention in Python to name constants in order that it is easy to change the actual value of the constant without having to update every use site. This convention started at least in C, which uses UPPER_CASE for such names, and PEP 8 agrees -- but not everyone does (in particular, large parts of mypy don't). So if we have e.g.

DEBUG = 1
INFO = 2
WARN = 3
ERROR = 4
FATAL = 5

then I would like to be able to write things like

match loglevel:
    case DEBUG: print("Lots of debugging")
    case INFO: print("Some debugging")
    case WARN: print("Warnings only")
    # Etc.

This seems especially important when the constants are imported from elsewhere, e.g. from logging import DEBUG, INFO, WARN.

If, in the example given above, none_rprimitive is a value that the pattern should be matched against, then there are three possibilities:

  1. none_rprimitive stands for a simple value like 42 and we could then replace it by 42 in the pattern. (incidentally, this works in Python 3 much better than in Python 2 because True and False are now keywords);

See above.

  1. none_rprimitive is a user-defined class, in which case we would write none_rprimitive() (note the parentheses) to invoke its __match__ method, anyway;

I think that's the case in this example. The definition is on L248 in the same file and it is clearly a constant (annotated with # type: Final) but for various reasons it is an instance of a specific class (RPrimitive).

  1. none_rprimitive is not really a constant value or class in the strict sense, in which case the pattern should be expressed with a guard as, e.g., case x if x == none_rprimitive:.

That would look a bit clumsy in this case.

As a compromise (where have I seen this before?) I would propose that if the name is all lowercase (including underscores and digits) it is an extraction target and if it starts with or contains an uppercase letter it is considered a variable. We would then have to recommend the mypy project to rename things like none_rprimitive to e.g. NONE_PRIMITIVE.

Note that for constants prefixed with a module and/or class name (anything with a . in it) there should not be a problem -- those are always considered constants.

gvanrossum commented 4 years ago

Regarding Point(x, x) -- I agree that the compiler could treat this as an error. Does this generalize to any two variables extracted in different parts of the same pattern? E.g.

case Foo(x, bar=Bar(x, y)):`

Or what about

case Foo(x) | Bar(x):

?

Tobias-Kohn commented 4 years ago

Yes, I see your point with the constant values coming from other libraries, or the need for more complex constant objects. Although, as you pointed out, you could always write constants from other modules as attributes with a dot (e.g., rtypes.none_rprimitive). And I think we certainly agree that attributes should always be trated as loads/constants.

Using upper- vs. lowercase to distinguish constants and variables/targets seems to me like a good compromise. If I recall correctly, Scala uses this convention, based on that Java classes and constants should start with an uppercase characters, whereas variables should not.

Tobias-Kohn commented 4 years ago

Concerning Point(x, x), I think that case Foo(x, bar=Bar(x, y)) clearly is an error as well, because x is assigned to twice here. Pattern matching is very convenient for "flattening" nested structures, i.e. we could not distringuish the two xs here -- and moreover, we should not.

The case Foo(x) | Bar(x): is entirely illegal -- at least in Scala. Every variable in a pattern there must be assigned to exactly once, so that it has a uniquely determined value. As a consequence, OR patterns cannot contain any variables. But I wonder whether such a strict rule makes sense in case of Python, or whether we could find something that is more lenient/"duck typey". I have a vague feeling that coming up with good rules for this might be a bit tricky (I could be wrong, though).

gvanrossum commented 4 years ago

It should not be hard for the parser to check that there can only be one assignment to any given variable. This should be similar to the check for duplicate parameter names in functions.

For Foo(x) | Bar(x) we could make a more lenient rule indicating that allows one assignment to a variable in each branch. I haven't figured out the entire algorithm needed but I don't expect it to be difficult.

There is another issue regarding variable assignments -- suppose we have a pattern

case (x, y, str()): print("hit")
case _: print(x, y)

and the target is (1, 2, 3). Assuming sub-patterns are matched from left to right, will x and y have been assigned when the default case is reached? Or will the values somehow be held "in escrow" until the whole pattern is known to match? That requires more storage, and still isn't good enough if there's a failing guard:

case (x, y) if x == y: print("pair")
case (_, _): print(x, y)

Clearly the variables have to be assigned before the guard is executed. Should they be "unset" if the guard fails?

The simplest approach just sets variables as matching progresses and leaves them set if we fail at a later sub-pattern. This is similar to how the walrus operator works -- it just assigns to a local variable with standard local scope, e.g.

if (x := t[0]) and (y := t[1]) and x == y: print("pair")
elif x and y: print(x, y)

This behavior in turn comes from Python's general reluctance to introduce nested local scopes.

(We do that for comprehensions so it's not entirely unheard of, but there are consequences, e.g. assignment to variables in the containing scope would have to be marked with nonlocal or else they would be assumed to be bound in the local scope. That's not usually a problem in comprehensions, and we made a specific exception for the walrus operator here, but case blocks seems different -- somehow I'd prefer to be able to translate a pattern into a bunch of tests, with extractions mapped to walrus assignments, and that would lead to all extraction variables living in the scope of the containing function/class/module.)

brandtbucher commented 4 years ago

Using the casing of the names to distinguish these seems very un-Pythonic to me. Python's compiler does an excellent job of elegantly and intuitively sorting out local, nonlocal, and global names, while providing workarounds for the less-trivial cases. I would feel disappointed if we settled for case-sensitivity rather than finding a similarly elegant solution here.

It seems to me there are four "easy-to-implement" options, and two "hard-to-implement" options.

Easy:

  1. Force users to declare explicitly either the load or store targets. This has some precedent today, in def, nonlocal, global, etc. See the proposal that started this thread as an example.

  2. Add a token to either the load or store targets (probably store). Doesn't require any context of the surrounding code. Easy-to-read, easy-to-write, easy-to-teach, easy-to implement. Lots of (good and bad) options to choose from: <x>, x?, =x, x=, ...

  3. Use the casing of the names to disambiguate. I really, really wouldn't want to have to explain to a beginner why this matters here, but nowhere else.

  4. Outlaw loads entirely. This is attractive for compiler/vm writers but awful for users, for many of the reasons Guido mentioned above. Not only do I lose my great symbolic constants (which, let's remember, could differ based on architecture/os/implementation/whatever), but anytime I forget this rule, I'll silently match and overwrite my nice name that's used everywhere.

Hard:

  1. Try to figure it out at compile-time using the known local namespace. This will likely either restrict where we can put these statements, or change the semantics subtly for function-scope vs everywhere else. Another similar flavor is trying to resolve things at function definition-time, which has the same issues.

  2. Try to figure it out at runtime by attempting a load (from which namespace?) and falling back on a store. This feels like it will usually do what you want, but occasionally surprise you in tough-to-diagnose ways.

I feel that Easy 1 and Easy 2 are our best bets. It should, ideally, be trivial to refactor old code to use the new feature without having to change or resolve the names of other code elements.

viridia commented 4 years ago

I have a slight preference for ?x over x?. The latter, to me, means 'optional'. $x also works for me, or any other single character that is traditionally interpreted as being a variable interpolation in other languages (shell, make, JS, etc.). Pattern matching is a kind of 'reverse interpolation'. (Although I wouldn't suggest using sscanf %s).

I have a slight preference over using a token (Easy 2) vs. the other options.

brandtbucher commented 4 years ago

This behavior in turn comes from Python's general reluctance to introduce nested local scopes.

I recently surprised someone by showing them how easy this was:

class _:
    for x in y:
        ...

It's just the Python equivalent of "wrap it in braces". :wink:

gvanrossum commented 4 years ago

This reminds me of an idea I learned nearly 40 years ago, in the context of the design of Python's predecessor, ABC (I wasn't the designer then :-). Too many languages were designed with the convenience of the compiler in mind, requiring the user to do extra work. Forcing the user to mark either loads or stores with extra syntax (either Easy 1 or Easy 2) because it would be too hard for the compiler to figure it out seems like a fallback to those pre-ABC days.

Hard 2 is unacceptable because it means the compiler cannot properly generate code for this. (Check out the translate() methods in my sample code in patma.py -- it translates every pattern type into simple code that works in today's Python. The idea would be to do this at the AST level so the bytecode compiler wouldn't have to know about patterns.)

I think Hard 1 is doable -- the compiler already knows the local namespace and all containing function namespaces (in case of nested functions). It could be extended to know what's (potentially) set in the global namespace as well, except for from blah import *.

But Hard 1 has another issue. There may be variables in outer scopes with simple names (such as x or item) that the user doesn't remember exist in outer scopes and that are obvious names to choose as extraction variable names.

I admit that Easy 3 (the case distinction) is also a form of letting the user mark up the distinction, but it has the advantage that it uses an established convention for variable names and constant names, and I am still in favor of it -- it reads quite natural to me, compared to x? or ?x or <x> or =x or x=.

I am not all that worried about having to explain the inconsistency to beginners. Beginners usually don't see all that much consistency -- they see an overwhelming array of confusing notation, and they use their brain's pattern matching to sort it out (some more successful than others :-). It's like when you're explaining a board game to a new player: you inevitably end up mixing hard rules and important strategy concerns together, and beginners often don't know whether they can't make a certain move because it's forbidden or just because it would be a bad idea strategically. (Due to shelter in place I have witnessed this a fair bit, and experienced it myself as well. :-)

There are tons of other things we're planning to do slightly different from other places in the language -- e.g. we all seem to agree that the target "ab" should not be a match for the pattern (a, b) even though (a, b) = "ab" works in unpacking assignment (#15). We also seem to be okay with not supporting raw iterators as matches for sequence patterns (#7).

brandtbucher commented 4 years ago

Forcing the user to mark either loads or stores with extra syntax (either Easy 1 or Easy 2) because it would be too hard for the compiler to figure it out seems like a fallback to those pre-ABC days.

Okay, then let's seriously consider Hard 1.

First, a huge data point which aligns with your observation: of all of the listed options, your rough rewriting of our examples just naturally works with both of the "Hard" options, and doesn't work with any of the "Easy" options (including case-sensitive contexts). That's probably because it's a very natural style to those already familiar with the way Python uses names, and builds on the same rules that the language has successfully used for decades. As a result, it can usually "just work" in any existing codebase.

But Hard 1 has another issue. There may be variables in outer scopes with simple names (such as x or item) that the user doesn't remember exist in outer scopes and that are obvious names to choose as extraction variable names.

Hm, maybe. We really have two cases to consider here:

If this is the biggest issue, I still think Hard 1 is attractive.

The idea would be to do this at the AST level so the bytecode compiler wouldn't have to know about patterns.

To clarify, are you suggesting the final implementation wouldn't involve changes to the VM/compiler? Or just the proof-of-concept?

It seems that these statements will involve some specialized moves/mechanics that could be greatly aided by specialized bytecode. I'm not opposed to the idea of making this an AST transformation (I think it's too early to tell), but it seems surprising to be basing fundamental design decisions on that assumption.

I think it's important to recognize that, unlike all of the third-party pattern-matching libraries we're drawing inspiration from, we will have the huge benefit of first-class language support.

There are tons of other things we're planning to do slightly different from other places in the language...

Well, I'm not a huge fan of this line of reasoning. Even so, "case-sensitive name contexts" definitely puts the others to shame!

viridia commented 4 years ago

It seems that we have gotten to a bunch of workable solutions and are debating aesthetics at this point - which is more unpalatable and less Pythonic, a prefix character or case-sensitive name contexts?

To avoid an impasse, I suggest several approaches:

1) Write some real-world code using the various techniques, in a larger context. Possibly even re-write some existing code to use the new functionality. Often when considering an idea in the abstract, we over- or under- estimate the impact of a design decision; our availability bias means that we tend to weigh more heavily examples that we can easily bring to mind. For example, do we yet have a sense of the relative proportion of loads vs. stores?

Also, aesthetics are affected by familiarity - something may seem ugly at first, but eventually may grow on you. I know this is true for many first-time Python users unfamiliar with the use of indentation to define block structure. So too with these ideas.

2) User testing: write up some of the more problematic cases and show them to Python programmers who have not previously been part of the discussion; without coaching them or prompting them, ask what their intuition tells them about how the code will behave.

Of course, once this reaches the formal PEP stage, there will be an order of magnitude more bikeshedding...

Tobias-Kohn commented 4 years ago

Writing or looking at real-world code is IMHO a great idea to think about how to move forward. Also because it seems like pattern matching has two wildly different parents. On the one hand we have the switch/case from Algol/C as a table of options to choose from. On the other hand there is the pattern matching as a form of "checked assignment" as in functional languages (from the latter we already have a bit in Python with sequence unpacking). If you come from Algol then it is most natural and important to have simple loads, whereas if you come from functional languages, it is most natural to have simple stores (as loads are rather rare and basically always expressed as literal values, anyway -- see my comments above). So, to some extent we have the problem of two different communities, each of which will have a different idea of what we are doing/proposing here.

Now, if I may add my little bit of experience with pattern matching; although it is not written in Python itself, I have just had a look at how I used match/case-statements in my larger projects such as the Python-parser. To my own surprise, I predominantly use it as a glorified switch to check for enum-values. Interestingly enough, however, those are written as attributes, anyway (e.g., case TokenType.LEFT_PARENS) and are therefore clearly loads (because of the dot). Apart from that, I use it to check for types and extract some values/fields if one of the types match (e.g., case IntegerToken(pos, value)), sometimes combined with a guard like, e.g. case IntegerToken(pos, value) if value < 0. Also taken from looking at my code: I tend to use it as a switch table primarily when there are a lot of alternatives and if would be cumbersome/inconvenient, whereas usages as value extractors rather contain just a few alternatives. Even though it is possible to use named constants in Scala (like the none_rprimitive example at the very beginning), I find that I just never came across an example where I had/wanted to do that (other than referring to them as attributes, that is).

If nothing else, this might explain why I am so much more in favour of assigning to all names that are not used as "attributes" or "calls" :-). And the upper-/lowercase convention is very familiar to me, too, which means that I am hugely biased in this matter.

Tobias-Kohn commented 4 years ago

For what its worth, let me also respond to Brandt's great list.

  1. I am personally not a big fan of the nonlocal and global statements (although I completely accept their necessity). If we are not very careful and find a solution that works 99% of the time without such additional keywords, then I feel introducing something like that might spoil much of the convenience of pattern matching in the first place. The match X as y would not work properly because the variables effectively used in the different case-statement would probably differ quite significantly.
  2. These additional tokens seem very arbitrary to me, and I believe that adding another symbol would actually add more of a hurdle for people who need or want to read Python code, but are not familiar with the exact meaning of a question mark, say, in front of a name.
  3. I do understand the reluctance, but I do not agree with the argument: I do not think that pattern matching is a feature intended for beginners, anyway (not to mention that there are already a lot of "oddities" for beginners to cope with, including why an if-elif-else chain has to start with an if instead of an elif, say, or when to use =, ==, and :=). The argument that there are "constant" objects with lowercase names that we want to use in pattern matching seems more convincing to me.
  4. Again, although I understand the sentiment, I am not too happy with the argument of overwriting names. You would merely introduce a local variable that shadows your global variable, not overwrite its value with ramifications everywhere.

A combination of 1 and 4 might be a viable compromise, though, in the following sense. I would still claim that in most cases of pattern matching, you use names as targets/stores to extract values from a data structure. Constant values are expressed as either literals or attributes. If you wanted to use a "regular" name as constant, you declare that in the beginning. But, again, I feel that something like this only makes sense if we are pretty sure that most use cases have no need for such an additional keyword (and my view on "regular use case" here is certainly biased).

Finally, if we go for "hard 1", I would only consider global names that are directly assigned on the module level. This would not only exclude anything imported via from egg import *, but also things like module.foo = ... or things assigned via global inside a function. Otherwise, we quickly end up in "hard 2", where it is no longer the compiler that can reliably determine what is a globally accessible name, but it has to be done at runtime.

gvanrossum commented 4 years ago

My main desire is to be able to write case (x, y): ... -- that was one of the selling points of this notation in the first place. If we have to distinguish between x the assignment target and x the named constant, maybe we can use a relatively unobtrusive notation like +x, ^x or .x? But I still think looking at the case of the variable would cover 99% of the cases.

brandtbucher commented 4 years ago

But I still think looking at the case of the variable would cover 99% of the cases.

Yeah, that's the one thing that redeems this for me. It also helps that no perfect solution seems to exist yet.

I'm okay with moving forward using case-sensitivity for the POC, since it looks like it has 50% support here anyways and is relatively painless to change later.

But just for the sake of moving forward. :slightly_smiling_face:

Tobias-Kohn commented 4 years ago

I looked at a few other languages and how they handle this issue, and found an approach taken by Thorn to be interesting.

Scala just uses backticks to mark an identifier as a constant value instead of a variable to be assigned to:

case `pi` => print("This is 3.1415936...")

This is also used to allow keywords to pass as identifiers/attribute names, which is important because of interoperability with Java (an issue that also comes up with Jython).

Thorn introduces several concepts in this regard (however, I got the feeling that Thorn is a research language to try out various concepts, anyway). Their main reasoning, however, is that something is either a variable, or an expression. Variables take on the value of whatever they match in their respective positions. Expressions always yield values, against which the object's values are then compared.

However, to properly distinguish between expressions and patterns (including 'store' variables), they introduce the evaluation operator $. This is in spirit extremely close to string interpolation in Python: $(...) can contain any expression, which is then properly evaluated (without any pattern matching whatsoever) and the result is injected into the pattern match. Hence, $pi would then yield the numeric value again, and not overwrite the variable pi. Because variables are assigned to immediately, it is then possible to have a pattern like (x, $x), which will check whether both elements of the tuple are equal. But you could equally do (x: int, $(x+1)) to check for two consecutive integers, say.

Although the syntax +x looks like an expression, it has a different meaning. They argue that + is the identity operator on all values—except None (null), on which it is not defined. Hence, they effectively use +x to bind the variable x to any value but None. Its semantics is thus something like if value is not None: x = value.

Instead of having a symbol to discriminate load/store semantics as such, I think the notion of an interpolation operator might be something worth discussing or considering. On the one hand, it has a precedent in Python with string interpolation, on the other hand, it opens up much more possibilities than simply a load-marker. On the flip side, though, it would mean to think hard about evaluation order and when variables are effectively assigned to. And we should also be aware that basically the very same thing could be achieved using guards.

ilevkivskyi commented 4 years ago

I considered allowing $(...) with arbitrary expression (nor just $name) to load something a while ago. But after all I think it may be not a great idea. Some arguments:

ilevkivskyi commented 4 years ago

Just to clarify, the cognitive load for remembering the load/store context extends not only on plain names but also on all enclosing calls, for example these two are very different:

match shape:
    as Line(Point(x1, y1), Point(x2, y2)):
        ...
    as $Line(Point(x1, y1), Point(x2, y2)):
        ...

This particular case was probably the main reason why I abandoned this idea.

Tobias-Kohn commented 4 years ago

I agree that $ is not visually appealing and you make a very good point with the Line-example. I am not so sure about "cognitive load", though, or how rarely it is actually used. I think for the latter, we should be more open that to just look how existing Python code would translate to pattern matching.

However, my main point is that if we want to consider a marker for loads, I would argue that we consider something more general than just a plain load marker. Although the dot-syntax in issue #19 is kind of nice, and would nullify this idea here.

viridia commented 4 years ago

Question: is Point(x, .x) valid? I suspect not, because of order-of-evaluation issues - x has not yet been assigned at the moment where .x is being evaluated.

viridia commented 4 years ago

A related question: Is the following illegal?

match someValue:
    as [x] if x < 0:
        print(x)
    as _ if x >= 0:
        print(x)

The second match arm refers to 'x' because it was assigned a value in the first match arm (which is evaluated first), and we don't 'rewind' variable bindings if a match arm fails.

I feel like it should be illegal, but I'm not sure how you would detect a case like this.

gvanrossum commented 4 years ago

@Tobias-Kohn:

However, my main point is that if we want to consider a marker for loads, I would argue that we consider something more general than just a plain load marker. Although the dot-syntax in issue #19 is kind of nice, and would nullify this idea here.

The dot-syntax (and any name-marker syntax, in fact) is easily extensible in the future. E.g. suppose we currently just allow .name; in the future we could add .(expression) without problems.

@viridia:

Question: is Point(x, .x) valid?

I propose not to allow reusing variable bindings later in the pattern (of course it is allowed in guards). As this example shows it would be an invitation to cleverness. The Python compiler could easily detect and reject this.

Is the following illegal?

match someValue:
    as [x] if x < 0:
        print(x)
    as _ if x >= 0:
        print(x)

That looks horrible, and should probably be rejected by a static checker, but I don't want to complicate the Python compiler for match statements to be saddled with the responsibility of detecting/rejecting it. But if someone finds an easy way of detecting it I wouldn't object.

Python's current compiler does do a complete analysis of local name usage, and maybe we could add some kind of rule that states that if a variable is bound anywhere in a given match statement, it can only be used in guards (and blocks?) for cases that actually bind it.

But traditionally this is the kind of thing where we tell users not to do it without making it impossible -- certainly the integrity of the virtual machine is not at stake here.

Tobias-Kohn commented 4 years ago

I fully agree.

I propose not to allow reusing variable bindings later in the pattern.

Yes. There are cases like Foo(Bar(x), x), where it is not necessarily obvious which of the x is the one that gets bound first. A strict left-to-right rule would suggest the first one. But considering practical aspects, we might want to unpack Foo first (thereby binding the second x), and then match its first argument against Bar(x). Hence, not allowing to reuse variable bindings inside the pattern probably makes our lives much easier.

match someValue:
    as [x] if x < 0:
        print(x)
    as _ if x >= 0:
        print(x)

That looks horrible, and should probably be rejected by a static checker, but I don't want to complicate the Python compiler for match statements to be saddled with the responsibility of detecting/rejecting it.

I think if we wanted to fully avoid this, we would have to introduce a separate scope for each match case to make sure that no bidings escape. However, some future implementation might be able to optimise pattern matching, reorder some cases, or check them "in parallel". It might therefore be a good idea to explicitly state that the x in the second case as _ if x >= 0 is undefined. It might have the value as assigned to in the previous statement, but there is no guarantee that it does. In fact, the only guarantee that we give is that all bindings of a successful match survive the entire match block.

The dot-syntax (and any name-marker syntax, in fact) is easily extensible in the future.

Although it might be interesting to consider this some time in the future, I am also happy with using the dot just as a load marker. A leading dot to mark the current namespace seems natural enough to me and a good compromise.