gvanrossum / patma

Pattern Matching
1.03k stars 65 forks source link

SyntaxError if 'case NAME:' is not the last case #148

Open gvanrossum opened 4 years ago

gvanrossum commented 4 years ago

There were some concerns over the typical footgun of writing

case FOO:
    <code>
case BAR:
    <code>

(with the intention that FOO and BAR are global constants, not capture patterns).

The CPython implementation issues a SyntaxWarning for this. Maybe we should just make that a SyntaxError. (Even though it's hard to express even in the PEG grammar.) Mark Shannon also wants this.

I am fine with this.

Tobias-Kohn commented 4 years ago

I am also fine with this, and think it actually has merit to point out obvious mistakes.

And to be fair: some syntax errors cannot be sensibly expressed by a context-free language, anyway. Take the rule that no two parameters in a function can share the same name, for instance. So, we do not necessarily have to put that into the grammar.

brandtbucher commented 4 years ago

I'm okay with this as well. People ignore warnings, and I cannot imagine this ever being "correct" (except perhaps in poorly-generated code).

gvanrossum commented 4 years ago

If we ever add an else clause, an irrefutable case must not be followed by any else clause. #147

Tobias-Kohn commented 4 years ago

an irrefutable case must not be followed by any else clause

Generally, I agree with this. But I would be careful with the wording. Irrefutable cases can actually take quite complicated forms, making it impossible for the compiler to decide whether they really are irrefutable or not. So, in short: the rule as it stands cannot be checked.

Perhaps it would be enough to talk about the irrefutable case and define somewhere that it refers to the case <NAME>...?

gvanrossum commented 4 years ago

Hm, in my naïveté I think irrefutability is easily checked recursively.

That’s it. Presence of a guard makes a case non-irrefutable.

What am I missing?

Tobias-Kohn commented 4 years ago

I was primarily thinking of the last case you mention:

Presence of a guard makes a case non-irrefutable.

This is not necessarily the case semantically as in, e.g., case x if 2 < 3:. It has a guard, however, it will always succeed, hence it is semantically irrefutable, although it might be impossible for the compiler to discover this for all possible conditions in a guard.

The second objection to the list would be class patterns. case object(): is also irrefutable, and if we allow the isinstance to be overwritten, any class C might end up being irrefutable.

In the context with OR, I would think of a silly example like case (mutable() | immutable()):, which I would claim is also irrefutable.

Long story short: what we need is syntactic irrefutability (irrefutability that can be decided purely based on the syntax), which is naturally a subset of semantic irrefutability (everything that is 'logically' irrefutable). Obviously the compiler can only really check the former.

Personally, I would not have a problem with defining what we mean by syntactic irrefutability and then state that any such case must be the last branch in a match statement (including the potential else:). But come to think of it, I am a bit afraid that it would add another set of rules to the PEP, which is already perceived as having too many rules... So, the question is: is it really worth the trouble? Or do we just go with the irrefutable case (where the <NAME> in my post was meant to include the wildcard, of course)?


EDIT: We could also just say that the compiler will reject any case branch (again, including else) that follows a case for which it can prove that it is irrefutable. In other words: the compiler will complain about anything following syntactic irrefutability, but leave it open to the exact extent of syntactic irrefutability. The only condition is that a simple name (or wildcard) is in this set.

What speaks against this, of course, is that if the compiler gets better in detecting such cases we loose backward-compatibility. A program that runs with Python 3.10.0 might not run with 3.10.1 because the latter rejects an unreachable case branch. If pattern matching is used in libraries, this might lead to quite a mess. Hence not a good idea!

gvanrossum commented 4 years ago

I think it is worth it defining which patterns are syntactically irrefutable (basically my definition) and stating that any guard-less case whose pattern is syntactically irrefutable must not be followed by another case or else.

Compilers that are smarter than that can issue SyntaxWarnings.

Note that case object(): may not be irrefutable if a global variable object exists or if builtins.object has been replaced.

I am not worried that adding this rule will be considered as a problem (and if it is, we can offer only to declare case NAME: to have this requirement).

dmoisset commented 4 years ago

Having an or-pattern where one in the left is irrefutable is a good candidate for errors too. For example in:

match value:
    case None: ...
    case 0: ...
    case MAX_VALUE|MIN_VALUE: ... # I intended to use these names as constants here
gvanrossum commented 4 years ago

Agreed.

brandtbucher commented 4 years ago

Hm, in my naïveté I think irrefutability is easily checked recursively.

  • a single capture pattern is irrefutable
  • a wildcard is irrefutable
  • an OR is irrefutable if any of its sub-patterns is
  • a walrus is irrefutable if the RHS is

That’s it. Presence of a guard makes a case non-irrefutable.

Does this sound good, then? If so, I'll update the spec and implementation.

gvanrossum commented 4 years ago

Yup, still sounds good to me. Let's do it.

gvanrossum commented 4 years ago

See language in https://github.com/python/peps/pull/1649 -- please review.

gvanrossum commented 4 years ago

(Landed.)

brandtbucher commented 4 years ago

Just got back from a trip. Sorry for the delayed review... but this looks good to me anyways! My one lingering question about patterns like case _ | _ was addressed in the linked PR: as expected, this too is a syntax error.

gvanrossum commented 4 years ago

Labeling as fully pepped -- this is now incorporated in PEP 634. (And also implemented, thanks Brandt!)