Open shtaif opened 11 months ago
Yep, makes sense.
It seems fair to me to say that a lot of real sources will be generators that are queueing calls, even if the transformers are not queueing them. If that is true the source would generate the remaining requested items, and then return as requested, and no amount of cajoling by the helpers would convince it to do otherwise, right?
Definitely agree that generators seem to be the intuitive go-to for the common mind, for me they are. As much as we design these helpers/transformers to be transparent - absolutely, if the source is a generator, they still couldn't mitigate the "can't turn this thing off! π±" nature of itself.
I really hope that with time more of those end user use cases would involve already well-formed sources obtained from standard native APIs and or ecosystem libraries, rather than having to write substantial generator code by hand.
A bit off-topic, but maybe ecosystem libraries might step in here to mitigate generator DX by offering a utility that wraps generators as-are, just changing their .return()
behavior in a somewhat better way...
Also https://github.com/tc39/proposal-iterator-helpers/issues/223.
Both of those discussions preceded the change to support concurrency, though. Now that we are planning to allow concurrency, I've been thinking along the same lines as the OP.
Concurrency effectively creates a request response model, doesn't it? Isn't that a pretty reasonable way of thinking about what's going on?
a call to a helper's
.return()
should just transparently forward it to the source iterator's.return()
In addition to forwarding to the underlying iterator, we might want to also set an internal bit which marks the helper as "done", so that subsequent calls to .next
aren't forwarded to the underlying iterator. I could go either way on that, I guess.
It's possible that it should also immediately settle any outstanding promises from prior calls to .next
with { done: true }
. I haven't yet thought about pros/cons for that question very much and welcome input. Currently leaning towards no: the underlying iterator can do that itself, if it wants that behavior.
Concurrency effectively creates a request response model, doesn't it? Isn't that a pretty reasonable way of thinking about what's going on?
Arguably all async iterators are a request-response model, yes. But you could reasonably interpret .return
as either "I will not make any future requests, but I still care about the results of any outstanding prior requests" or "I will not make any future requests and I no longer care about the results of any outstanding prior requests", so that doesn't immediately tell us whether return
should be queued or forwarded immediately.
I think the latter is generally more useful, and better matches real use. Consider
let hasX = producer.buffered(concurrencyFactor).some(val => isX(val));
When .some
finds x
, it will call .return
. Ideally that should be taken as a signal by the producer to stop doing work to serve any outstanding requests. Not all producers will handle it that way, of course (for example, async generators will have their own internal queue), but some will. If we add a cleanup
helper like in the thread I linked above, for example, you might write
let controller = new AbortController();
let signal = controller.signal;
let hasX = AsyncIterator.from(urls)
.map(url => fetch(url, { signal }))
.cleanup(() => controller.abort())
.buffered(concurrencyFactor)
.some(val => isX(val));
When some
finds the thing it's looking for, you want it to call .return
immediately so it can trigger the cleanup
callback and abort any outstanding fetch
s. There's other ways you could get the same effect in this simple snippet, but in more complicated ones (where the producer and consumer are separated) I think it's basically necessary that it work like this if we want to have the chance to clean up efficiently.
Incidentally you may both be interested in this (long) old thread on the original async generators repo and the many, many pingbacks.
It's possible that it [read: a call to
return
] should also immediately settle any outstanding promises from prior calls to .next with { done: true }. I haven't yet thought about pros/cons for that question very much and welcome input. Currently leaning towards no: the underlying iterator can do that itself, if it wants that behavior.
As much as this seems an appealing and intuitive behavior to settle on, I'm also leaning towards no on this due to the many possible flexibilities it might trade away:
In addition to forwarding to the underlying iterator, we might want to also set an internal bit which marks the helper as "done", so that subsequent calls to .next aren't forwarded to the underlying iterator
In the light of all the considerations elaborated above this, which are supportive of helpers' "transparency" to their underlying iterator, I'd say maybe better to stick to it also in this regard - to not manage any internal "done" state of their own. I agree it barely makes sense in most scenarios to forward subsequent .next()
calls to the underlying iterator after having reported a { done: true }
. But still, just for the sake of being predictable and consistent with the guiding principles I guess. Would love to see anyone coming up with more views about this dillema π€π»
...you could reasonably interpret .return as either "I will not make any future requests, but I still care about the results of any outstanding prior requests" or "I will not make any future requests and I no longer care about the results of any outstanding prior requests", so that doesn't immediately tell us whether return should be queued or forwarded immediately.
This one is a really interesting point touched there. I'd like to share some semantic interpretation of my own on top of this view, in hope it might contribute something to the main subject here.
.return
indeed doesn't inherently imply any one of these options in particular - "I still care" or "I no longer care" about outcomes of outstanding prior requests. But not to the conclusion that in the async iteration model the producer has to choose only one way - and then any consumers of it have to accept/adapt to what's dictated. Instead - by principle a producer can actually facilitate both these ways in without coupling, and the consumer can always opt which way it wants to shut the pipeline by.
What I'm talking about is simply the difference between these two methods of consumption we're already familiar with:
const iterator = getSomeIterableIterator();
(async () => {
for await (const value of iterator) {
console.log(value);
}
})();
// And later, via some other code context:
await iterator.return();
The above shows a consumer that closes the iterator mid-action - during a prior in-progress pull, because it does this in disconnection with the for await...of
loop which would be paused awaiting
on some pull at that moment. This is quite naturally implying "I no longer care about the results of any outstanding prior requests". Note, obviously we're subject to the iterator supporting early cancelations (no deal-breaker either way), however the indication of the intent is pretty clear and semantically natural.
On the other hand, if the consumer wishes to indicate "I still care about the results of any outstanding prior requests" - it could simply just perform the closing in between iterations - when you look at this in simple eyes, it becomes obvious this essentially opts to close "without canceling outstanding requests" as we've coined it - since clearly we've first drained any outstanding request and THEN closed (with break;
):
const iterator = getSomeIterableIterator();
(async () => {
for await (const value of iterator) {
console.log(value);
if (hadEnoughValues()) {
break;
}
}
})();
The bottom line in all that is the choice between closing with canceling or not canceling outstanding requests can be seen as purely a consumer choice, and how the source iterator is implemented isn't inherently concerned with it (apart from specifying how or if to actually early abort its stuff, whenever the consumer opts for this).
As described in the
README.md
's "Concurrency" section, and also as been discussed a lot across the Add a buffering helper issue thread - the helpers are designed to NOT queue up.next()
calls like async generators would, but instead delegate each call over to the source iterator's.next()
as soon as it was made. And I absolutely support this of course.Now, I'm bringing up this issue for us to consider the effect of calls to a helper's
.return()
in the same regard as above - which I didn't see was touched yet πWith every async generator today, a call to
.return()
gets queued on top of whatever pending.next()
calls were made prior to it (which IMO is inherently problematic since there is no guaranty when the last previous.next()
calls are finally gonna resolve, until which any underlying open resources cannot be released whatsoever). However as for our standpoint here though, I suppose a call to a helper's.return()
should just transparently forward it to the source iterator's.return()
, and consequently resolve (the promise it returns) simply as soon as the source iterator's returned promise is resolved.That makes sense, right? π€π»