tc39 / proposal-pattern-matching

Pattern matching syntax for ECMAScript
https://tc39.es/proposal-pattern-matching/
MIT License
5.51k stars 89 forks source link

Proposed simplification & increased symmetry #322

Open erights opened 7 months ago

erights commented 7 months ago

Playing a little fast and loose with the metagrammar, I think this proposal should be simplified to

A.1 Expressions


The main change is how the distinction between comparing against values is expressed vs how bindings are introduced. The existing proposal treats a bare Identifier as a lexical reference to a variable from the enclosing scope, so it introduces bindings with a let, const, or var prefix. This is not at all like destructuring. IMO people will be confused by this asymmetry.

I saw in the proposal a suggestion of a pattern to be able to start with a (normally binary) relational operator followed by an expression. This is the opportunity to restore the symmetry with destructuring! This gives a way to introduce expressions whose value can be compared, including expressions using variables from outer scopes. Any Identifier appearing normally in a pattern is binding occurrence, just as one expects from destructuring.

erights commented 7 months ago

Separately, I agree with @tabatkins that the when should be dropped. I find the argument from indentation + ; convincing. But here I focus on the pattern side.

Jack-Works commented 7 months ago

We used to make Identifier a binding, but that approach is considered harmful. You can see history discussion at https://github.com/tc39/proposal-pattern-matching/issues/281

EqualityExpr ~= MatchPattern

I'm unsure why ~= is better than is. We don't have many symbols left so I prefer is.

void and ...void

It will be added by https://github.com/tc39/proposal-discard-binding. If that proposal succeeds, apparently ...void will also be valid.

MatchPattern = AssignmentExpr

We haven't seriously considered this yet, but we have an editor note in the spec "It is possible to add Initializer to MatchProperty and MatchElement."

consider && and ||

We've discussed that in the past, see https://github.com/tc39/proposal-pattern-matching/issues/179

consider !

not is a natural choice after we use and and or.

ljharb commented 7 months ago

@erights i'm a bit confused about what your OP is proposing. Are you suggesting we flip back to "idenfitiers are irrefutable patterns", and use a sigil to mark bindings, as opposed to let/const?

erights commented 7 months ago

On Mon, Apr 8, 2024 at 8:14 PM Jack Works @.***> wrote:

We used to make Identifier a binding, but that approach is considered harmful. You can see history discussion at #281 https://github.com/tc39/proposal-pattern-matching/issues/281

That is a huge thread! Where should I look? Our could you summarize?

This issue of bare identifiers is my biggest concern, so clarification appreciated. Thanks!

Jack-Works commented 7 months ago

That is a huge thread! Where should I look? Or could you summarize?

The core problem is readability. You can see Yulia's analysis of the old syntax https://docs.google.com/document/d/1dVaSGokKneIT3eDM41Uk67SyWtuLlTWcaJvOxsBX2i0/edit

erights commented 7 months ago

On Mon, Apr 8, 2024 at 8:23 PM Jordan Harband @.***> wrote:

@erights https://github.com/erights i'm a bit confused about what your OP is proposing. Are you suggesting we flip back to "idenfitiers are irrefutable patterns",

Yes

and use a sigil to mark bindings, as opposed to let/const?

Not to mark bindings. Bare identifiers are bindings. ===expr uses === to mark an expression, including a normal variable name use occurrence. And without privileging === over other relational operators.

ljharb commented 7 months ago

If I understand properly, that was our original proposal - and yulia and the firefox team blocked it. The change to using let/const was to unblock the proposal.

erights commented 7 months ago

That is a huge thread! Where should I look? Or could you summarize?

The core problem is readability. You can see Yulia's analysis of the old syntax https://docs.google.com/document/d/1dVaSGokKneIT3eDM41Uk67SyWtuLlTWcaJvOxsBX2i0/edit

Thanks. Looked, but without having followed the history I'm having a hard time understanding this doc. Since I do understand the current proposal, which IIUC meets @codehag 's objections, could you write one of these examples using the current state of this proposal. Or any example that should illustrate the disadvantage of my suggestion. I'll then try to rewrite in terms of my suggestion and we can compare.

erights commented 7 months ago

Hi @codehag !

I think we have aligned notions of programmer cognitive costs, so I'm curious if we actually arrive at opposite conclusions. Do you support the current proposal over my suggestion? Could you show a maximally challenging example written in the current proposal, for me to rewrite in my suggestion?

codehag commented 7 months ago

Regarding that document, my purpose was not to use necessarily let or const, but rather to make it clear that a binding was taking place for the user, in a way that was more obvious. Aliasing in my opinion is an antipattern as it overrides notions that the programmer has about assignment: We should have instead added an as keyword or similar. But this is history.

After discussion with the champions, I was dismayed but convinced that we could reuse aliasing. Since it is being used in Extractors, if that advances I think it will be clear how we should advance here. My understanding in the calls was that the champions had their own reasons for having let/const. This was over a year ago. If it is only for my sake that we have let/const, it should be removed. It is clear that the tendency in the language goes a different way.

Do you support the current proposal over my suggestion? Could you show a maximally challenging example written in the current proposal, for me to rewrite in my suggestion?

The proposal authors have done a great job in trying to respond to my comments. My core concern was that the proposal is too complex, which they fundamentally disagree with. This was also what my proposed breakdown tried to do. We've moved quite far since there, but my core concern still applies. I think we should start with a simpler approach. In fact, I think that we can get very far in terms of pattern matching with extractors (it fulfills my concept of a base proposal, but this is a separate topic).

I am not sure how to communicate this. I support pattern matching in the language, and a lot of the work has been really great here. But my concern was not addressed. Not only for implementation concerns, but for usability and also to allow us to correct ourselves over time. The blocking concern, when I made it, was complexity -- not any particular syntax. Unfortunately, I think my attempt in being precise about my concerns resulted in the proposal becoming more complex, which wasn't my intention.

Of course, I also have personal thoughts here. In particular, we should take time with the pattern matching language -- what @erights are proposing changing in the second half of your post iiuc. We have a potential to do something really great with this, I'd like to see it fully fleshed out. It could be treated like RegExp. Another area I disagreed with the champions is that this needs to be tied to a match statement. But it doesn't look like this is the direction things are going and I don't have much time now to help with the design. So I will set these thoughts aside rather than get in the way of those who are doing the work.

Regarding your suggestions, I think there are potential benefits, such as a closer relationship to existing syntax, and also drawbacks, such as searchability being generally worse for operators. Again, would love to see a deep investigation of a language for structural matching, without any concerns regarding the match statement.

Overall, I support simplification.

ljharb commented 7 months ago

@codehag fwiw in the current proposal the pattern DSL is not tied only to the match statement; there's also the is operator, and a lot of follow-on proposals planned.

codehag commented 7 months ago

Apologies! I am not up to date. I saw the proposal recently and I was still unsure about the level of complexity, but again, not planning on getting in the way here. I'd like to see a simple subset land first and build from that, but @ljharb you and I have had a call about this and we both know where we stand :). I won't rehash it.

erights commented 7 months ago

My sense is that destructuring + extractors is almost all of what I want from pattern matching itself. Just for pattern matching, the only deficiencies I see with destructuring + extractors alone:

How well could we write the equivalent of and, or, or not using extractors? If adequately well, I'd prefer to omit these from the grammar. But let's see what it looks like.

Other less urgent elements which I think I still favor, but probably could be talked out of. They could at least be postponed:

But, no matter whether we generalize destructuring to become useful as refutable patterns or we introduce a pattern notion distinct from destructuring, I think we will need something like the proposed match expression, for which the proposal itself is a fine place to start. OTOH, if we have ~= with the above scoping rules, then

match (specimen) {
  patt1: action1;
  patt2: action2;
  default: action3;
}

is much like

if (specimen ~= patt1) {
  action1
} else if (specimen ~= patt2) {
  action2
} else {
  action3
}

or even, for the expression case

specimen ~= patt1 ? action1 
  : specimen ~= patt2 ? action2
  : action3

The repetition of specimen need not be more painful that introducing a short variable name for purposes of repetition, if necessary. On reflection, I'd agree that it is not worth introducing the whole match syntax family merely to avoid the pain of repeating the specimen.

So, I favor enhanced destructuring centered on extractors, together with some of these additional enhancements. Perhaps we don't need a distinct notion of patterns or special match syntax. My main regret would only be that the array and object destructuring is too irrefutable. This by itself still may be sufficient to want a distinct notion of "pattern".


I would still love to examine all this against motivating challenge examples. Please someone, fire away!

erights commented 7 months ago

Ah. Because we cannot parameterize an extractor inline, we cannot write extractor combinators as extractors, if we want the combinator expression inline, which we do. Sigh.

ljharb commented 7 months ago

I definitely agree that object destructuring has a few mistakes in it, in particular using : instead of as for renaming.

erights commented 7 months ago

I definitely agree that object destructuring has a few mistakes in it,

agree so far

in particular using : instead of as for renaming.

torn. I think I disagree. But spilled milk under the bridge.

ljharb commented 7 months ago

at the least, hopefully we can agree that having "renaming properties" and "renaming imports" be different was a mistake - it consumes double the syntax space for the same conceptual operation.

erights commented 7 months ago

In light of the above conversation, here's a more minimal variant of my suggestion, without distinct MatchExpr and MatchPattern productions:

A.1 Expressions

Where the ~= expression in flow control conditionals introduces into scope of the success case, including ?:.


Of the things I omitted, the ones I'm most torn about are the combinators. But we could start without these, and later consider further extending LeftSideExpr with them after we get more experience with the basics above.

ljharb commented 7 months ago

It's not easy for me to read spec grammar; can you provide some code examples?

erights commented 7 months ago

The first interesting example on this PR's front page:

match (res) {
  when isEmpty: ...;
  when {data: [let page] }: ...;
  when {data: [let frontPage, ...let pages] }: ...;
  default: ...;
}

rewritten

res ~= Empty() ? ... // Empty is extractor
  : res ~= { data: [page, ...Empty()] } ? ... // page in scope
  : res ~= { data: [frontPage, ...pages] } ? ... // frontPage, pages in scope
  : ...

Do you have some examples you'd like to see me rewrite? Bonus points if they show off the advantages of the current proposal over my simpler suggestion ;)

ljharb commented 7 months ago

oof, nested ternaries seems like an instant nope from me. the precedence confusion those cause is a big part of the reason why most JS styleguides/linter configs ban them.

erights commented 7 months ago
if (res ~= Empty()) { // Empty is extractor
  ... 
} else if (res ~= { data: [page, ...Empty()] }) {
  ... // page in scope
} else if (res ~= { data: [frontPage, ...pages] }) {
  ... // frontPage, pages in scope
} else {
  ...
}
ljharb commented 7 months ago

Gotcha. That has the downside of having to repeat res multiple times, and also means you can't use it in an expression position, which is a highly desirable aspect of this proposal.

erights commented 7 months ago

Understand. Agrees these are the costs. Issue is weighing these against the costs of the rest of the complexity of the patterns proposal.

erights commented 7 months ago

Also, btw, do

ljharb commented 7 months ago

do expressions would handle that for sure, but we can't assume they're ever going to be in the language, so we have to design so pattern matching works well with and without them.

erights commented 7 months ago

To exaggerate a bit for a kind of clarity:

Both the above ?: work as an expression, and the above if chain work as a statement. We'd be happy with the if chain if it were an expression. We see three ways to address this dilemma:

Truly, once formatted with Prettier, I find tail-nested ?: chains quite readable. And this would be a very stylized use of them which would become a recognized idiom.

rbuckton commented 7 months ago

I personally have no issue with tail-nested conditionals, but conditionals do not provide irrefutability, while match does. In a match you can specify a leg for each case you expect using when. If you do not also supply a catch-all default clause, then any value outside of the specified cases should throw. With nested conditionals, you are forced to provide a default case. If you want the default case to throw, you must write that out yourself.

rbuckton commented 7 months ago

There are quite a few things in the proposed syntax that do not have consensus amongst the champions and are included for illustrative purposes for the benefit of ongoing discussion.

To me, an MVP Pattern Matching proposal includes:

I would consider the following to be nice to have additions:

I don't believe any of the following features are necessary for an MVP, though they may be nice to have. The champions have mixed opinions here:

I have a strong preference for let/const patterns as we absolutely must be able to distinguish between a reference and a binding. Rust can get away with this because it has a type system, but it's still difficult to tell when reading a Rust pattern as to whether it introduces a binding or references a type. Three approaches were proposed to address this:

I believe the current syntax makes common cases very legible while explicitly calling out bindings:

match (opt) {
  Option.Some(let value): ...;
  Option.None: ...;
}

match (command) {
  [("up" or "down" or "left" or "right") and let direction, Number and let steps]: handleMove(direction, steps);
  ["jump", Number and let howHigh]: jump(howHigh);
}

vs the ${} version:

match (opt) {
  ${Option.Some} with [value]: ...;
  ${Option.None}: ...;
}

match (command) {
  [("up" or "down" or "left" or "right") and direction, ${Number} and steps]: handleMove(direction, steps);
  ["jump", ${Number} and howHigh]: jump(howHigh);
}
rbuckton commented 7 months ago

Regarding ~=, I find is to be more practically meaningful when reading code:

x is String
x is "foo" or "bar"
x is not null
x is C
x is not C

vs.

x ~= String
x ~= "foo" or "bar"
x ~= not null
x ~= C
x ~= not C

If we can use a meaningful keyword here, I'd rather do that and save sigils like ~= for cases where we maybe can't use a keyword.

is is also the same number of characters but is far easier to type since some non-US-english keyboard layouts do not include ~.

ljharb commented 7 months ago

hasOwn relational patterns are a very niche use case.

this is definitely untrue, as evidenced by the very high usage of "has own" packages in the ecosystem.

waldemarhorwat commented 7 months ago

We need something easily parseable to introduce the section with the new pattern matching syntax. Due to differing treatment of division vs. regular expressions, we can't use is or anything else that can be used as an identifier as an infix operator to mark the start of the new pattern matching syntax. That ship has already sailed — there are too many contexts where that would interfere with existing syntax and other proposals.

A new token such as ~= would work.

ljharb commented 7 months ago

@waldemarhorwat can you elaborate? why wouldn't we be able to add a new binary keyword? wouldn't the required whitespace disambiguate?

waldemarhorwat commented 7 months ago

It would not. There are plenty of existing contexts where an expression can be followed by an identifier, in which case it would become impossible to lex any farther. The plethora of existing and proposed cover grammars makes the situation worse due to the lexing problem. I am worried that these ambiguities will present a security problem, with different parsers treating the same text differently.

ljharb commented 7 months ago

can you give me an example? <expression> <identifier> <something else> doesn't seem like something that you can do currently.

waldemarhorwat commented 7 months ago

Here's a trivial example:

One of the proposals introduces a new cover grammar that allows void without an expression. That then allows us to construct things like: void is /a/g Now, is this a void expression whose argument is is, and the whole thing is divided by a and g? Or is it an argumentless void cover grammar used in an is expression whose pattern is the regexp /a/g? If it's the latter, the cover grammar would complain later, but we'd never get to that stage because we don't know which cover grammar to refine.

ljharb commented 7 months ago

Then that sounds like an argument against that proposal, because killing new binary keywords shouldn't be considered viable. Is there an example that's actually in the language?

waldemarhorwat commented 7 months ago

Yes, there are numerous similar examples in the language. Due to past syntax decisions, the ship for introducing new binary keywords without them being reserved words has sailed.

ljharb commented 7 months ago

I'm sorry to keep repeating this, but can you share one of these examples?

waldemarhorwat commented 7 months ago

The thing that really worries me is this rule:

CoverCallExpressionAndAsyncArrowHead[?Yield, ?Await] [no LineTerminator here] { MatchExpressionClauses[?Yield, ?Await] ; }

This will work once, but the juxtaposition of the cover grammar with the match syntax will preclude all future attempts to introduce new syntax consisting of the form keyword (expr) {block}.

We've gotten ourselves into a corner.

ljharb commented 7 months ago

The current proposal's spec text can be adapted any way needed. I agree that we need to ensure that that form can be added in the future, as well as that new unreserved binary keywords can be added, and any proposal that would disrupt that should not advance until the problem has been resolved.

waldemarhorwat commented 7 months ago

@ljharb: Example: let is. This one can be fixed with more line break restrictions, but there are others I'm working on constructing which can't.

rbuckton commented 7 months ago

It sounds like we would need to rethink some of the cover grammars, not abandon infix keywords.

waldemarhorwat commented 7 months ago

We can have either prefix non-reserved keywords or infix non-reserved keywords. They're generally mutually exclusive. Given a blank slate, we could have decided on one or the other, but we don't have a blank slate. We already have prefix non-reserved keywords to introduce various things in the language and some more in proposals in the pipeline. Trying to introduce infix non-reserved keywords would create problems for prefix non-reserved keywords.

ljharb commented 7 months ago

Proposals in the pipeline. pre-stage 3, can (and should) be changed if they're closing off design space. What are the ones in the language? (you mentioned let, so presumably var and const, but i'm not sure why that precludes new keywords?)

waldemarhorwat commented 7 months ago

Proposals closing off the design space include things like explicit resource management, which is in stage 3, as well as a bunch of things already in the language. But the patterns proposal's current form would close off more of the design space than any of those — see my comment about the match syntax above. If we're worried about closing off the design space (and I am), we should fix the syntax of the patterns proposal to not use keywords usable as identifiers to mark patterns. Things like ~= are one of the options to preserve the design space.

ljharb commented 7 months ago

Can you provide a list of the things already in the language?

waldemarhorwat commented 7 months ago

Making that list would be an important effort. Finding them all would require computer validation of the grammar, including cover grammars. I wrote such a thing for an earlier version of ECMAScript, but it would take me a while to bring it up to date.

hax commented 7 months ago

at the least, hopefully we can agree that having "renaming properties" and "renaming imports" be different was a mistake - it consumes double the syntax space for the same conceptual operation.

@ljharb I can't remember who said that, but it is said destructuring is assignment but import is aliasing so they use different syntax to denote semantic difference. Though I can't tell whether I agree or not.

gustavopch commented 7 months ago

FWIW, if destructuring used as, we'd probably be able to write:

const sayHello = ({
  to as name: string
}) => { ... }

instead of:

const sayHello = ({
  to: name
}: {
  to: string
}) => { ... }

But of course it's too late. :/