Closed gibson042 closed 2 years ago
...
must consume the entire iterator; without that, it needn't consume the whole thing.
I was quoting the current README: Array/iterable destructuring (emphasis mine)
These contain a comma-separated list of zero or more patterns, possibly ending in rest syntax (like
...rest
).This pattern first verifies that the matched value is iterable, then obtains and consumes the entire iterator.
This behavior is patterned after object destructuring. Object destructuring will likewise consume the whole iterable, whether or not you use the whole thing.
> function* f() { yield 1; yield 2; yield 3 }
> g = f()
> const [a] = g
> a
1
> g.next()
{ value: undefined, done: true }
If we want to stay consistent and provide the least surprising behavior possible, we need to also consume the whole iterable during pattern matching, which means we need to cache the result, so it can be used in each match arm.
Otherwise, you can't do this:
const data = ({ a: 1 })
match (Object.keys(data)) {
when ([key1, key2]) { ... }
when ([key1]) { ... } // We wanted this to match, but it won't, because the iterator got consumed on the first match arm.
when ([]) { ... } // Whoops! This will be the one that matches
}
Ah, fair enough. Exhausting the iterator seems unavoidable then, given array destructuring.
The caching is also unavoidable IMO, because iterators aren’t reusable.
Object destructuring will likewise consume the whole iterable
It's actually more complicated... array destructuring "IteratorCloses" the iterable without consuming it (which is actually necessary to destructure infinite iterators), by invoking return
(which defaults to terminating iteration for built-in-constructed iterators including iterators returned from generators, but can be arbitrarily overridden). However, that is irrelevant for arrays, which return a fresh iterator every time one is requested—and therefore your Object.keys
example works as expected if the when
clauses are independent of each other.
I believe the caching is not only avoidable, but should be avoided. Regardless, if it is a part of this proposal, then there's a whole lot to address, including what exactly is cached, by what key, and how the cache responds to reference reuse and/or dynamic mutation... and the result is definitely not going to be broadly intuitive. Supporting speculative destructuring to accommodate non-array-like iterators is weird, and will add lots of unnecessary complexity.
@gibson042 would you agree that in the case where ...
appears in the LHS of a match clause's top level array destructuring pattern, the iterator must be exhausted?
would you agree that in the case where
...
appears in the LHS of a match clause's top level array destructuring pattern, the iterator must be exhausted?
I would agree that an iterator must be exhausted in that case, but not that such an iterator must necessarily be shared across clauses.
I can’t conceive of why you wouldn’t want the iterator’s results shared - you’re matching against a single iterator, so that iterator mustn’t be called more than once.
match (Object.keys(data))
isn't matching against a single iterator, it's matching against a single array.
That is true, but in a normal array case, the behavior of caching vs not caching would be the same. Do you have a (non-pathological) use case where caching gets in the way?
I would describe as pathological any case where caching behavior (or lack thereof) is observable, including matching over a non-array-like iterable.
Why? Lots of things in the language are observable - including the current lack of caching in iteration - if you write an iterator to observe it - which is what I’d call pathological.
Do you have an actual use case where the semantics - unrelated to observability - are impacted?
I would expect everyone else to avoid such sharp edges, just like approximately no one tries to iterate over a value returned by map.keys()
multiple times or mutate an array during its iteration even though the behavior governing such operations is well-defined. There's only a problem if a problem is created by caching, which will work as expected until it doesn't. I don't have a real use case that calls for pattern matching a non-array-like iterable at all, and suspect that the very idea is an anti-pattern for the above reasons. And for those times when something tricky is going on, the specifics of when cached data is used and when it is not will be surprising and potentially detrimental.
Doing my best at answering your question, though, the presence of caching for stream-like iterables suggests support for code like the following, which is (deceptively?) greedy and should instead be refactored for better stream-orientation.
while (true) {
match (utf16CodeUnits) {
when ([ leading, trailing, ...tail ]) if (isLeadingSurrogate(leading) && isTrailingSurrogate(trailing)) {
emitCodePoint(0x10000 | ((leading - 0xD800) << 10) | (trailing - 0xDC00));
utf16CodeUnits = tail;
} when ([ ch, ...tail ]) {
emitCodePoint(ch);
utf16CodeUnits = tail;
} else {
break;
}
}
}
And I also just noticed from the above example that proper support for iterables would probably require pattern matching against the head without complete consumption.
I'm not clear on what that example is doing; the when
s shouldn't be stackable in that way.
If utf16CodeUnits
is an iterable, then that could be fine - but if it's an iterator then it's a category error to be matching against it in a loop.
Whoops, I intended by Object.keys()
example to match against an iterator, not an array. But, you can turn the array to an iterator and get similar weird behavior:
match (Object.keys(data).values()) {
...
}
Here's another example. Expected behavior would be that [...iterator]
and iterator
would behave the same inside a match expression, but notice that this is not the case if we don't consume the whole iterator and cache the results.
const groupLineComponents = components => match (components) {
when ([command, arg1, arg2]) {({ command, args: [arg1, arg2] })}
when ([command, arg1]) {({ command, args: [arg1] })}
else { throw new Error('Invalid line') }
}
function* tokenizeLine(line) {
// Simplified for brevity
yield* line.split(' ')
}
groupLineComponents(tokenizeLine('GOTO 12')) // Error: Invalid line
groupLineComponents([...tokenizeLine('GOTO 12')]) // { command: 'GOTO', args: [12] }
I'm not clear on what that example is doing; the
when
s shouldn't be stackable in that way.
@ljharb What do you mean? How is my example structurally different from this one?
If
utf16CodeUnits
is an iterable, then that could be fine - but if it's an iterator then it's a category error to be matching against it in a loop.
That's kind of my point (and as an aside, it's more difficult than I thought to write intentionally incorrect code). It seems like you're agreeing with me, shouldn't each independent array destructuring attempt invoke the relevant @@iterator
method and use its result (which for built-in non-iterator iterables is always a new object)?
Here's another example. Expected behavior would be that
[...iterator]
anditerator
would behave the same inside a match expression
@theScottyJam I don't think I have this expectation, just like I don't have an expectation that when ({ stable: true })
will match after when (obj) if (delete obj.stable === false)
. But I will note that caching iteration results from the matchable itself is much less problematic than caching results from its components, which is my primary concern.
i strongly disagree that each arm should separately invoke @@iterator
. that operation is not, even in idiomatic usage, idempotent.
@gibson042 what I’m saying is that a match construct is not meant to be used like that - to progressively move through an iterator’s state. You could use it in a for..of on a yielded item, but it’s not for partially scanning through an iterator.
For any use case I can think of, either only one of iterator
/[...iterator]
is supported by a particular operation, or, if they're both supported, they do the same thing. After all, they're both conceptually trying to represent a list of elements, one of them is just lazy.
@theScottyJam I don't think I have this expectation, just like I don't have an expectation that when ({ stable: true }) will match after when (obj) if (delete obj.stable === false)
This delete example is the type of code I would want to avoid in a match arm. I naturally wouldn't expect state changes to happen from branch to branch, and I might be caught off guard and not notice if a programmer is writing code like that (the delete keyword is easier to notice, but calling a state-changing function, for example, would be much harder). Ideally, you would keep everything pure within each arm, as that's how it's supposed to conceptually operate. Of course, the end user has the power to do non-pure stuff within the if
, but they should reasonably do what they can to put any impure logic into the body of the arm. Similarly, we should reasonably do what we can to avoid state-changes from arm to arm, as that goes against the philosophy of how a match expression is supposed to operate.
Maybe it would help if we address some of your concerns about the complexity of caching.
if it is a part of this proposal, then there's a whole lot to address, including what exactly is cached, by what key, and how the cache responds to reference reuse and/or dynamic mutation... and the result is definitely not going to be broadly intuitive. Supporting speculative destructuring to accommodate non-array-like iterators is weird, and will add lots of unnecessary complexity.
I think you're overly complicating the mental image of how this caching is supposed to work. I imagine it'll be something like this:
match (['a', 'b'].values()) {
when ([x, y]) { ... }
when ([x]) { ... }
}
// is the same as
match ([...['a', 'b'].values()]) {
when ([x, y, ...etc]) { ... }
when ([x]) { ... }
}
That's it. That's how "caching" is done. It wasn't that difficult here. We didn't need to worry about cache keys or anything like that, that's irrelevant to how this operates. The "caching" term is just saying that the results of consuming the iterator is being held around in a temporary for the lifetime of the match expression.
There are some minor details that still need to be worked out, like, how much of the iterator to consume when match arms don't use all of the resulting values (they're not using a spread to capture all results). Should the needed amount be consumed up-front, or on an as-need basis? I imagine it would be the latter scenario.
Yes, I agree 100% that stateful operations inside the construct are a bad idea and should be avoided—the example at the top of this issue is explicitly identified as pathological. But destructuring a stream-like iterable is inherently stateful, and if using them with match
is considered reasonable then that requires making tradeoffs around complexity, intuition, and utility.
We didn't need to worry about cache keys or anything like that, that's irrelevant to how this operates.
@theScottyJam as I said, caching iteration results from the matchable itself is much less problematic than caching results from its components. But that's not what's presented in the README, which dives into the matchable and becomes subject to these additional concerns. Behavior will be observably different if the cache key is e.g. "the matchable's data
property" vs. "the value returned from [[Get]]ing the data
property" vs. "the value originally returned from [[Get]]ing the data
property".
Well, perhaps the most natural intuition would be that the first match arm will consume the iterator and close it, just like how it's done with destructuring, and then no other match arm can match against the result of the iterator. But that makes pattern-matching against iterators pretty useless, and causes it to go against the natural intuition that pattern matching should be stateless. So, we'll need to deviate from that somehow.
If we deviate by gradually consuming the iterator in each match arm, then we're still left with a not-so-useful feature that goes against the intuition that pattern matching should be stateless from arm to arm.
If we deviate by allowing each match arm to match against the results of consuming the iterator, then we've got a very useful feature that follows the intuition that pattern matching should be stateless from arm to arm.
Yes, in all of these scenarios, there will be the unfortunate and unavoidable state change that the iterator will be completely consumed after the matching is done. But at least we're not modifying state between match arms.
@theScottyJam as I said, caching iteration results from the matchable itself is much less problematic than caching results from its components. But that's not what's presented in the README, which dives into the matchable and becomes subject to these additional concerns. Behavior will be observably different if the cache key is e.g. "the matchable's data property" vs. "the value returned from [[Get]]ing the data property" vs. "the value originally returned from [[Get]]ing the data property".
Ah, I see. You're talking about it the iterator is in an object. I see this feature as something that keeps the proposal consistent but should probably not be used. I assume the cache key would just be the generator itself (we cache by reference). as we're pattern matching, if we run into a generator, we check to see if it's in our cache. If it is, we used the cached results, and consume it more (and cache more) as necessary. If it's not, then we start consuming it and caching the results. At the end of the match expression, we close all iterators.
Of course, this hasn't actually been discussed and it would be good to open an issue to figure out the fine details of how this would work. But I can't think of any other reasonable way to make this happen.
We wouldn’t ever do anything based on “generator”; a generator is just a kind of iterator. All iterators would be treated the same, just like array destructuring syntax.
The cache mechanism certainly hasn’t been decided, but yes, it would ensure that only one Get() is called for any property - if you have a getter that doesnt return the same value when called multiple times (ie, a pathological one), you’ll find that only the first result is matched against.
The cache mechanism certainly hasn’t been decided, but yes, it would ensure that only one Get() is called for any property - if you have a getter that doesnt return the same value when called multiple times (ie, a pathological one), you’ll find that only the first result is matched against.
Ah, so caching applies to property access in addition to iteration results? And presumably on first access, such that when ({ stable })
will match after when ({ stable } as obj) if (delete obj.stable === false)
but not after when (obj) if (delete obj.stable === false)
? Should such caching be visible throughout the entire construct (including if
guards), or only in match patterns, or even only in destructuring match patterns (i.e., not for pin or protocol patterns)? And presumably not for references that leave the matchable's graph, right?
const matchable = {stable: true};
match (matchable) {
when ({ stable: true } as obj) if (delete obj.stable === false) {…}
when (/*matches by caching?*/ { stable: true }) if (/*fails? 🤞*/ matchable.stable) {…}
}
The caching would only apply to match patterns - destructuring would, as always, do a fresh Get or IteratorNext etc, an if guard conditional is a normal expression context, and the RHS is just a normal statement list.
Also, a nested match construct would presumably not inherit any of the caching from its parent match construct.
Actually, side effects on iteration are just a special kind of side effects. You can also have other side effects by matching on a Proxy (it will invoke a proxy trap). So I guess the side effects here are acceptable.
The champions group met to discuss this yesterday, here's where we're at:
Has
/Get
, and the fallback for iterator caching will be that builtin iterators get repeatedly consumed, and non-builtin iterators are simply not usable with pattern matching*usefully usable, that is.
A question. If the current match semantics requires the number of items also matches, does it mean it's impossible to match 4 items from an infinite iterator?
match (Number.range(0, Infinity)) {
when ([a, b, c, d]) -> can I match here?
else -> ...
}
It’s not that the total length has to match - any iterator with at least 4 items would match, just like in array destructuring. No?
@ljharb - I don't think it works like that. (This was one of those inconsistencies with destructuring I had complained about previously - but I'm good with it now)
The README states this:
match (res) { if (isEmpty(res)) … when ({ data: [page] }) … when ({ data: [frontPage, ...pages] }) … else { … } }
[...]
The first when clause matches if data has exactly one element, and binds that element to page for the right-hand side. The second when clause matches if data has at least one element, binding that first element to frontPage, and an array of any remaining elements to pages.
I think the best that can be done, as the proposal currently stands, is to use the take() method from the iterator-helpers proposal. e.g.
match (Number.range(0, Infinity).take(4)) {
when ([a, b, c, d]) ...
else ...
}
This can't cover all use cases, and it's not pretty, but it should help with a good number of them.
Ahhh right, I see what you mean - I remember now :-) there's good reasons for this, yes, and it means that the match construct has to take 5 items to know if the 5th is a "done" or not. I agree that iterator helpers somewhat solve this with .take
and similar.
In general, my feelings on infinite iterators is that I do not care about supporting them, and I don't ever want to make normal use cases harder in order to do so, but we should never go out of our way to make infinite iterator use cases harder. I think the current proposal (in the readme) satisfies this.
Yes, an iterator has to attempt to consume five items to correctly match an [a,b,c,d]
pattern, to ensure that the fifth attempt fails and the iterator represents exactly four elements. The current spec (well, the README) asserts that you must capture the leftover elements with a spread if you want to do "four or more"; we haven't specified exactly what will happen if you try to match [a,b,c,d, ...rest]
against an infinite iterator.
The "helpful" behavior would be to bind rest
to the iterator itself (having been progressed by four items), but that gets... complex, with our intended iterator caching semantics. Instead, I suspect the answer is that we have to consume the entire iterator, hanging on infinite iterators.
I suspect we want to add a bare ...
pattern to indicate pure optionality. This is potentially useful for matching against plain arrays too (you won't create a superfluous binding for the rest of the array if you don't want it), but it also means that [a,b,c,d, ...]
only has to consume four items from an iterator to confirm a match, and works well for infinite iterators.
The only possible behavior there is the same thing as destructuring - it deadlocks on an infinite iterator. If we’d want to handle infinite iterators better, we’d need to do so in destructuring and maybe for-of, and then it would work in pattern matching automatically. We should not try to solve this problem as part of pattern matching.
No, we have a unique problem that needs solving. In destructuring, [a, b, c]
works just fine on an infinite iterator, pulling off three values and ignoring the rest. In pattern matching, [a, b, c]
will fail to match against an infinite iterator. If you want to reproduce the destructuring situation, you need [a, b, c, ...]
or something similar, to relax the length constraint but still ignore the rest of the iterator.
[a, b, c, ...rest]
will indeed deadlock in both destructuring and pattern-matching, and that's good.
That's a decent argument, but I also don't think it's worth supporting infinite iterators sufficiently to warrant adding more complexity to this proposal. I'd prefer to see it as a follow-on.
I disagree; as it currently stands it is impossible to use infinite iterators with array patterns, not just awkward or inconvenient. They are guaranteed to either deadlock or not match. Adding ...
as a possible rest pattern does not inconvience any other use-case, but makes infinite iterators possible.
It also helps with non-infinite cases. For finite iterators, you're trapped in the same dilemma: either the iterator has to be an exact length match (against [a, b, c]
), or you're required to exhaust the entire iterator (against [a, b, c, ...rest]
). There's no way to pull off N values and allow additional values without consuming the entire iterator. [a, b, c, ...]
would allow this.
This is a general problem caused by the mismatch in behavior of [a, b, c]
between destructuring and pattern matching, and affects several cases.
If it becomes infeasible [...] the fallback for iterator caching will be that builtin iterators get repeatedly consumed, and non-builtin iterators are simply not usable with pattern matching
I hope that this proposal doesn't really lead to builtin iterators being granted special syntactic hooks - that sounds like the return of pre-meta-object-protocol/@@toPrimitive/etc magic that prevented self-hosting, polyfilling, and realm patching. :/
No, by "built-in iterators" we mean the iterables like Array, where each time you ask for an iterator you get a fresh one that goes over the array again, versus straight iterators like calling a generator function, which return themselves each time you ask for an iterator and thus can't be repeatedly iterated in a useful manner.
For finite iterators, you're trapped in the same dilemma
this is a fair point; i wonder if maybe we should revisit the length-checking in light of that.
No, the length-checking itself is a very important functionality that most (all?) pattern-matching constructs use, and absolutely should not be removed. The length of an array is a significant thing people intuitively match against; without it, you have to either add guards doing the length check manually (ugh) or be careful about ordering your clauses so the longer ones come first and the end patterns won't match against undefined (huge footgun).
This is just one of the places where the mental model and common usage of destructuring and of pattern matching differ.
No, by "built-in iterators" we mean the iterables like Array, where each time you ask for an iterator you get a fresh one that goes over the array again, versus straight iterators like calling a generator function, which return themselves each time you ask for an iterator and thus can't be repeatedly iterated in a useful manner.
The picture is still fuzzy for me. My concern was about whether the what-we-would-do-instead case implied a syntactic feature that would bypass MOP operations, i.e. whether one would be unable to implement an iterable that would be accepted by a match pattern. That kind of special elevation may not be what's being proposed, but I'm not sure what is being suggested instead.
Would the distinction being made be iterable vs iterator (contractual), or would it be builtin vs non-builtin (branding*)? If it's the former, how would this work given every iterator that inherits from %IteratorPrototype% is also an iterable? Or is it something else entirely?
* if branding is "elevated to syntax", that's where self-hosting breaks
@bathos i'm pretty sure none of those concerns are in danger; there's a preexisting reality that builtin iterables' iterators are reusable, but userland iterators (from generators) are not, altho iterators in general are not reusable.
Yeah, you're overthinking this. ^_^ There is no brand-checking, branching on anything, or anything else of the sort, we're just talking about the consequences of how iterables and iterators work.
If we fail to get the caching semantics to work, then whenever you try to match something against an array matcher, it'll ask the matchable for an iterator. Arrays/etc return a fresh iterator each time you do so, so the matchers work as expected if you have to test against multiple arms. Generator objects just return themselves, so the matchers will not work as expected, because the second time you try to match against an array pattern, the first few items will have already been consumed by the previous one.
Here's a specific example:
function* range(start, end) {
for(var i = start; i < end; i++) yield i;
}
class Sequence {
constructor(start, end) { this.start = start; this.end = end; }
[Symbol.iterator]() { return range(this.start, this.end); }
}
const iteratorNums = range(1, 6);
const iterableNums = new Sequence(1, 6);
const arrayNums = [1,2,3,4,5];
match(arrayNums) {
when ([a, b])
console.log(a, b);
// doesn't match, because the array is length 5
when ([c, d, ...rest])
console.log(c, d, rest);
// matches, logging (1, 2, [3, 4, 5]);
}
match(iteratorNums) {
when ([a, b])
console.log(a, b);
// doesn't match, because the iterator still has items
// but did consume three items off of the iterator
when ([c, d, ...rest])
console.log(c, d, rest);
// matches, logging (4, 5, [])
// because the previous clause consumed the first three items
}
match(iterableNums) {
when ([a, b])
console.log(a, b);
// doesn't match, because the iterator still has items
when ([c, d, ...rest])
console.log(c, d, rest);
// matches, logging (1, 2, [3, 4, 5]);
// gets the same result as Array because, like Array,
// each call to Symbol.iterator returns a fresh iterator
}
Ideally, of course, we get caching semantics, and all three of them have the same result, and the same observable behavior overall, with each element of the iterator requested exactly once.
Thank you for explaining, it's totally clear now!
FWIW, if it seemed like this question came from nowhere / was "overthinking": it stemmed from misinterpreting "non-builtin iterators are simply not..." as a statement about something the match algorithm would do rather than a description of the typical consequence of what it would do. (It also seems I had a very different idea of what "builtin iterator" refers to.)
@bathos i'm pretty sure none of those concerns are in danger; there's a preexisting reality that builtin iterables' iterators are reusable, but userland iterators (from generators) are not, altho iterators in general are not reusable.
How? I don't see ArrayIterator can be reused by calling @@iterator on it.
@Jack-Works .values()
is an iterator, not a (non-iterator) iterable. I'm referring to built-in non-iterator iterables, ie, all builtin iterables whose Symbol.iterator
method isn't just "return this".
@Jack-Works if I understand right (and me trying to articulate this is also for the sake of getting confirmation about whether I’ve interpreted it correctly myself), that comment is saying that the @@iterator method of an iterable-that-is-not-itself-an-iterator is expected to be “reusable” in the sense that it would be unusual (even though not impossible) for it to yield different values if iterated twice consecutively. All built-in iterables-that-are-not-themselves-iterators* adhere to that behavior in a clean env, and it seems reasonable to say that’s how any such objects “should” behave, so if iterable matching behavior only accounts for this specific pattern, it can still be useful.
Obv @ljharb please correct me if I’m still wrong there.
* I think this is what folks may be using “built-in iterable” as a kind of shorthand for, which is part of what threw me off, though admittedly iterable-that-is-not-itself-also-an-iterator is a mouthful! It doesn’t actually have to be built-in, it’s just that the built-ins provide exemplars of the pattern that would be supported regardless of what ends up happening here.
"consumes the entire iterator" behavior is incompatible with plucking the head from an infinite or a finite but very large iterator, and "cached for the lifetime of the overarching match construct" is going to have confusing interaction with reference reuse and/or dynamic mutation: