tc39 / proposal-pattern-matching

Pattern matching syntax for ECMAScript
https://tc39.es/proposal-pattern-matching/
MIT License
5.46k stars 89 forks source link

Should we close iterators after they've been used in pattern matching? #235

Closed theScottyJam closed 2 years ago

theScottyJam commented 2 years ago

In #234, @tabatkins shared a code snippet that, frankly, I find a little scary.

when ([head, ...] & tail)

Here's a version that's just as scary, that doesn't rely on syntax that was being proposed in the other thread.

when ([] | tail)

In regards to the first example, Irrefutable matches + the "&" combinator is supposed to be a simple way to achieve a "pattern match + bind" effect, and yet, that's not necessarily what's going on here despite its appearance. If you're matching against an array, then yeah, "tail" would be equal to the entire array, and "head" would be equal to the front part. But if you throw an iterator in there, then "tail" would then be an iterator that's been advanced by one step.

when ([head, ...] | tail) // "tail" would be an iterator that has advanced one step
when ([head] | tail) // "tail" would be an iterator that has advanced two steps
when ([head, ...etc] | tail) // "tail" would be an iterator that's completely exhausted
when ([] | tail) // "tail" would be an iterator that has advanced one step

A decision that's currently up in the air is whether or not the match construct should close the iterators it uses. The only reason why we might leave an iterator open is to enable people to write this sort of difficult-to-understand code (any code that's written like this requires developers to have an intimate knowledge of how the caching behaviors work in pattern-matching, something that's supposed to be a mostly-invisible artifact). Instead, we can auto-close all iterators that the match construct partially consumes before we hit the RHS of a match arm. If a developer really, really wants to write this sort of code, they can always wrap their iterator in another one - the outer iterator will be auto-closed by the match statement, but the inner one will remain open and usable in the RHS of a match arm - this means we're not removing any features, we're just making it inconvenient to write nasty code, forcing developers to think twice about it.

ljharb commented 2 years ago

... is a bit different.

In the general case, once an iterator is exhausted, it certainly should close it, since (i think) that's what array destructuring does.

tabatkins commented 2 years ago

Once exhausted, yeah; a ...rest pattern should close the iterator. But no rest pattern, or a ... shouldn't, I think?

If we auto-close all iterators, that implies we can't control the behavior at the pattern level, either - authors need to supply the fix at the data level.

ljharb commented 2 years ago

Absent #234, yes, it should always close.

Indeed, the entire value of #234 seems coupled to it not closing the iterator.

tabatkins commented 2 years ago

I suppose I don't actually have an opinion on whether [a] will close the iterator. If it does, but [a, ...] doesn't, then that seems workable to me.

theScottyJam commented 2 years ago

@ljharb Not necessarily. The main value of #234 is to let you match against potentially infinite iterators. You don't need to continue to use the iterator after you've matched against it for #234 to provide value.

@tabatkins - hmm, perhaps I'm ok with that behavior as well (though it would be my favorite choice). It would at least mean developers don't have to understand that [a] consumes two items, not one.

ljharb commented 2 years ago

Actually wait, it can't close the iterator mid-matching, otherwise this wouldn't work:

match (x) {
  when ([a]) { … }
  when ([a, b, c]) { … }
}

because the iterator would be closed before it got to the second when.

theScottyJam commented 2 years ago

It would close it right before executing the RHS. So, if the first when fails, the iterator will still be open, because it didn't try to execute a RHS. This does mean it'll also be open for the duration of a guard clause.

tabatkins commented 2 years ago

Right, not mid-match, definitely. @theScottyJam's timing sounds reasonable.

And yeah, it needs to be open for the guard clause, since that's part of matching.

theScottyJam commented 2 years ago

Actually, @tabatkins, we can't have different behaviors depending on if ... is used, because you could have code that both uses and not uses ... on the same iterator. e.g.


match (thing) {
  when (([2, 3, ...] | [4, 5]) & theIterator) ...
}

// or
match (thing) {
  when ([2, 3, 5, 6]) ...
  when ([4, 5, ...] & theIterator) ...
}
tabatkins commented 2 years ago

That isn't necessarily a problem - you already have non-deterministic iterator behavior depending on how far down the match clauses you get. We could similarly keep a bit around to track whether a cached iterator has been iterated by a closing pattern or not.

ljharb commented 2 years ago

We could also just not close iterators until the match construct concludes?

theScottyJam commented 2 years ago

At that point, why close it at all? That wouldn't discourage people from writing the kind of awkward code I originally presented. It would just make for awkward behavior if someone tried to use the iterator after the match construct, instead of within the RHS of a match arm.

ljharb commented 2 years ago

I don't think we need closing behavior to discourage people from writing that kind of code; if that code passes linters and code review then their codebase is in trouble no matter what we do.

theScottyJam commented 2 years ago

I mean, it's true.

But if we have the option to close the iterator, and we have no reason not to close it, then why not close it? Unless, do you have a specific reason why you think it would be beneficial to leave it open? This would be a place where I would love to see if we can come up with a code snippet that showcases a good usecase for keeping the iterator open in the RHS. If we can't think of one, then we probably shouldn't leave it open "just in case" someone can find a good use for this dangerous ability.

It's not like closing the iterator is an unnatural thing to do in these sorts of language constructs either, for-of does it too.

edit: In fact, it might be good to see if good use cases for using the iterator in the RHS exist, it would really help this discussion, and would help us to know what the best solution would be (if we should completely close all iterators, or try and leave an iterator open when ... is used, or always leave them open if they're not completely exhausted)

bathos commented 2 years ago

it might be good to see if good use cases for using the iterator in the RHS exist

Lexers written as generator functions that yield matchable symbols. For LL1 grammars, match seemingly presents an API surface so expressive that it looks like it was purpose-built: you can get 1:1 alignment with grammatical notation where each match corresponds to a nonterminal, each when to an alternative, etc. No noise, perfect processing hooks.

Calling return isn’t itself a problem. How an iterator responds to return() is an author-responsibility, not a consumer responsibility, and if an iterator should just never be exhausted by syntactic use, that’s as simple as myGenFn.prototype.return = undefined.

But the particulars of how array patterns get consumed and if/when/where return would be called do seem to matter a great deal for a pattern like this (and for pretty much any other generator that isn’t just a fancy way to create a single array). If the non-binding, non-consuming version of ... is added, I think it would work (regardless of whether return is called, since again it’s already straightforward to opt-out of syntactic autoclosing). If the non-consuming ... isn’t added, though, I’m not sure how it would play out. You would need return(), and you would specifically need it to be called before the RHS is evaluated, because now you actually do have cleanup to do: “unshifting” the extra symbol match stole. But maybe it just wouldn’t be able to work at all without ...?

No idea what’s best* but given @theScottyJam’s comment, I figured I’d offer what I think is a realistic example of something someone (me) will attempt as soon as it becomes possible (if it does).

* But am in favor of non-consuming .... Not cause of this, just because it’s almost always the behavior I’d want. Specific length matching is useful, but it’s hard to imagine using it more than 1% of the time, at least outside of functional programming recursion pattern stuff? When I match { foo } I’m not saying the object can’t also have a bar property.

theScottyJam commented 2 years ago

Thanks @bathos for chiming in! That perspective is certainly helpful, and I can definitely see a use case like that.

Your comment about wanting to use ... almost all of the time is interesting as well - the assumption has been the opposite, which is why the current matching semantics for array works the way they do. If this assumption is faulty, and people want to not have length checks done more often than not, then perhaps it's worth considering making arrays-patterns not do length checks, then we can, perhaps, lean on tuple-patterns to enforce a length check (assuming you're allowed to use tuple patterns on arrays, which perhaps you can't). @tabatkins mentioned tuple destructuring in passing here, and I'm not sure if this is what was in mind when it was said, or not, but it's a potential avenue that could be explored back in that thread.

tabatkins commented 2 years ago

When you're matching against arrays rather than iterators, length-matching is usually what you want, and I think the footgun potential for [a] matching a long array to be more confusing/common than for [a] drawing two items from an iterator (and closing it afterwards).

This can change depending on individual's personal usage patterns! But I think that this is true overall.

When I match { foo } I’m not saying the object can’t also have a bar property.

Note that this was requested, in fact, with rest patterns relaxing it, precisely because it differs from array patterns currently. We can't reasonably do it, since our object patterns look up the prototype chain rather than just looking at own keys, but it's not an unreasonable thing, and some languages do use this in their pattern syntaxes.


That said, yeah, parsing with iterators is absolutely a big use-case I want to make sure works well in pattern matching, and ... is required for that to reasonably work, I think.

bathos commented 2 years ago

the assumption has been the opposite, which is why the current matching semantics for array works the way they do

With non-consuming ..., I personally wouldn’t find it weird if length-check is the default behavior — mainly because

So while I think I’d be writing ... more often than not, it doesn’t seem like a nuisance.

The only downside seems to be a risk of encouraging oddly brittle APIs when people don’t realize they need ... or ...rest to opt out of length-sensitivity. You seemingly can’t extrapolate the “only these” behavior from how property matching works + it’s the kind of quirkiness that would be easy to not realize needs a test. On balance, I think the clarity of the current syntax is worth this minor risk, though.

I am curious though what led to the conclusion that length-sensitivity would be the common case — that seems surprising to me.

tabatkins commented 2 years ago

I am curious though what led to the conclusion that length-sensitivity would be the common case — that seems surprising to me.

For one, it's the common case in most languages' pattern constructs, afaik. Matching other languages is good when we can do it, for knowledge transference reasons.

For two, without length checking, then in when ([a]) ...; when ([a, b]) ...; the first clause fully shadows the second one, which seems confusing and unexpected. You'd have to write your patterns in the order when ([a, b]) ...; when ([a]) ...; to get them to both actually be reachable.

The shadowing still occurs in when ([a, ...]) ...; when ([a, b, ...]) ...;, but that seems reasonably intuitive and expected, at least in my opinion.