Behavior of `.return()`ing, concurrency-wise

shtaif commented 11 months ago

As described in the README.md's "Concurrency" section, and also as been discussed a lot across the Add a buffering helper issue thread - the helpers are designed to NOT queue up .next() calls like async generators would, but instead delegate each call over to the source iterator's .next() as soon as it was made. And I absolutely support this of course.

Now, I'm bringing up this issue for us to consider the effect of calls to a helper's .return() in the same regard as above - which I didn't see was touched yet 😃

With every async generator today, a call to .return() gets queued on top of whatever pending .next() calls were made prior to it (which IMO is inherently problematic since there is no guaranty when the last previous .next() calls are finally gonna resolve, until which any underlying open resources cannot be released whatsoever). However as for our standpoint here though, I suppose a call to a helper's .return() should just transparently forward it to the source iterator's .return(), and consequently resolve (the promise it returns) simply as soon as the source iterator's returned promise is resolved.

That makes sense, right? 🤞🏻

conartist6 commented 11 months ago

Yep, makes sense.

conartist6 commented 11 months ago

It seems fair to me to say that a lot of real sources will be generators that are queueing calls, even if the transformers are not queueing them. If that is true the source would generate the remaining requested items, and then return as requested, and no amount of cajoling by the helpers would convince it to do otherwise, right?

shtaif commented 11 months ago

Definitely agree that generators seem to be the intuitive go-to for the common mind, for me they are. As much as we design these helpers/transformers to be transparent - absolutely, if the source is a generator, they still couldn't mitigate the "can't turn this thing off! 😱" nature of itself.

I really hope that with time more of those end user use cases would involve already well-formed sources obtained from standard native APIs and or ecosystem libraries, rather than having to write substantial generator code by hand.

A bit off-topic, but maybe ecosystem libraries might step in here to mitigate generator DX by offering a utility that wraps generators as-are, just changing their .return() behavior in a somewhat better way...

ljharb commented 11 months ago

see https://github.com/tc39/proposal-iterator-helpers/issues/122 I think

bakkot commented 11 months ago

Also https://github.com/tc39/proposal-iterator-helpers/issues/223.

Both of those discussions preceded the change to support concurrency, though. Now that we are planning to allow concurrency, I've been thinking along the same lines as the OP.

conartist6 commented 11 months ago

Concurrency effectively creates a request response model, doesn't it? Isn't that a pretty reasonable way of thinking about what's going on?

bakkot commented 11 months ago

a call to a helper's .return() should just transparently forward it to the source iterator's .return()

In addition to forwarding to the underlying iterator, we might want to also set an internal bit which marks the helper as "done", so that subsequent calls to .next aren't forwarded to the underlying iterator. I could go either way on that, I guess.

It's possible that it should also immediately settle any outstanding promises from prior calls to .next with { done: true }. I haven't yet thought about pros/cons for that question very much and welcome input. Currently leaning towards no: the underlying iterator can do that itself, if it wants that behavior.

Concurrency effectively creates a request response model, doesn't it? Isn't that a pretty reasonable way of thinking about what's going on?

Arguably all async iterators are a request-response model, yes. But you could reasonably interpret .return as either "I will not make any future requests, but I still care about the results of any outstanding prior requests" or "I will not make any future requests and I no longer care about the results of any outstanding prior requests", so that doesn't immediately tell us whether return should be queued or forwarded immediately.

I think the latter is generally more useful, and better matches real use. Consider

let hasX = producer.buffered(concurrencyFactor).some(val => isX(val));

When .some finds x, it will call .return. Ideally that should be taken as a signal by the producer to stop doing work to serve any outstanding requests. Not all producers will handle it that way, of course (for example, async generators will have their own internal queue), but some will. If we add a cleanup helper like in the thread I linked above, for example, you might write

let controller = new AbortController();
let signal = controller.signal;
let hasX = AsyncIterator.from(urls)
  .map(url => fetch(url, { signal }))
  .cleanup(() => controller.abort())
  .buffered(concurrencyFactor)
  .some(val => isX(val));

When some finds the thing it's looking for, you want it to call .return immediately so it can trigger the cleanup callback and abort any outstanding fetchs. There's other ways you could get the same effect in this simple snippet, but in more complicated ones (where the producer and consumer are separated) I think it's basically necessary that it work like this if we want to have the chance to clean up efficiently.

Incidentally you may both be interested in this (long) old thread on the original async generators repo and the many, many pingbacks.

shtaif commented 11 months ago

It's possible that it [read: a call to return] should also immediately settle any outstanding promises from prior calls to .next with { done: true }. I haven't yet thought about pros/cons for that question very much and welcome input. Currently leaning towards no: the underlying iterator can do that itself, if it wants that behavior.

As much as this seems an appealing and intuitive behavior to settle on, I'm also leaning towards no on this due to the many possible flexibilities it might trade away:

Click to see verbose elaboration here...

--- - Outstanding `next` promises being settled on the spot _artificially_ when calling `return` might have a tendency to false-suggest to a developer that no underlying processes are taking place here anymore whatsoever, while in reality this conceived settlement is only superficial and has nothing to do with whether the source did or did not actually early-abort any of these outstanding processes (assuming this is something you'd prefer to know as a consumer in the general use case). The source's outstanding promises are masked here in a too-implicit manner. - A source which _was_ designed to be capable of cleaning up processes for outstanding pulls in response to being `.return()`ed, cannot signal to the consumer __when__ every such individual cleaning gets completed, assuming this information is of interest in most situations (as such terminations of course can themselves be little asynchronous actions). Furthermore, maybe one might wish to be able to design more "sophisticated" sources that abort outstanding processes in more __selective__ and elaborate ways; like, based on some internal dynamically determined conditions, deciding to abort only _some_ of the outstanding while leaving the rest to complete gracefully, perhaps determined based on some configurable limits, such as only when some graceful timeout duration becomes due... Stuff which hypothetically might make sense in case these underlying processes are very expensive to start and/or abort, be it resource-wise or due to third-party service expenses, whatever - so you may want those which the source decided it's NOT going to abort to still able to run to completion uninterrupted and hand over their preciously obtained result to the respective pull promise, while only the rest which were chosen to be aborted as said may get then early-resolved into a `{ done: true }`. So, none of these useful dynamic use-cases could be practical. - Derived from the above ☝🏻, a source whose cleanup routine happened to error out might be willing to propagate this critical exception _not_ only via the promise returned from the `.return()` call, but _also_ through every outstanding promise held by the consumer that moment, however that wouldn't be possible to achieve if those promises were to superficially settle too early into a `{ done: true }`. Worth pointing out that I think _not_ restricting the cases expressed above might also make iterator helpers ideally align better with the __"consistency"__ or __"structured concurrency"__ principles that were discussed by some participates of this thread across some other issues, if I haven't misinterpreted those hopefully 🙂 I know many of these hypothetical usage scenarios in any case require much non-trivial and imperative implementation to nail right and robustly from the user side (like maintaining references inside a source iterator to/from its current pending pulled promises), but I still tend towards not disallowing them through restriction of freedom, in the same way that implementing flows with "traditional" regular functions is also quite imperative by nature but unrestrictive. For the latter - function composition and functional helper libraries have been helping with nailing the small details robustly, while what we're doing here is the same anyways; designing composable helpers for iterators to relieve the same difficulties without overly restricting. And still, would love to see anyone coming up with more views about these dilemmas 🤞🏻 ---

In addition to forwarding to the underlying iterator, we might want to also set an internal bit which marks the helper as "done", so that subsequent calls to .next aren't forwarded to the underlying iterator

In the light of all the considerations elaborated above this, which are supportive of helpers' "transparency" to their underlying iterator, I'd say maybe better to stick to it also in this regard - to not manage any internal "done" state of their own. I agree it barely makes sense in most scenarios to forward subsequent .next() calls to the underlying iterator after having reported a { done: true }. But still, just for the sake of being predictable and consistent with the guiding principles I guess. Would love to see anyone coming up with more views about this dillema 🤞🏻

shtaif commented 11 months ago

...you could reasonably interpret .return as either "I will not make any future requests, but I still care about the results of any outstanding prior requests" or "I will not make any future requests and I no longer care about the results of any outstanding prior requests", so that doesn't immediately tell us whether return should be queued or forwarded immediately.

This one is a really interesting point touched there. I'd like to share some semantic interpretation of my own on top of this view, in hope it might contribute something to the main subject here.

.return indeed doesn't inherently imply any one of these options in particular - "I still care" or "I no longer care" about outcomes of outstanding prior requests. But not to the conclusion that in the async iteration model the producer has to choose only one way - and then any consumers of it have to accept/adapt to what's dictated. Instead - by principle a producer can actually facilitate both these ways in without coupling, and the consumer can always opt which way it wants to shut the pipeline by.

What I'm talking about is simply the difference between these two methods of consumption we're already familiar with:

const iterator = getSomeIterableIterator();

(async () => {
  for await (const value of iterator) {
    console.log(value);
  }
})();

// And later, via some other code context:

await iterator.return();

The above shows a consumer that closes the iterator mid-action - during a prior in-progress pull, because it does this in disconnection with the for await...of loop which would be paused awaiting on some pull at that moment. This is quite naturally implying "I no longer care about the results of any outstanding prior requests". Note, obviously we're subject to the iterator supporting early cancelations (no deal-breaker either way), however the indication of the intent is pretty clear and semantically natural.

On the other hand, if the consumer wishes to indicate "I still care about the results of any outstanding prior requests" - it could simply just perform the closing in between iterations - when you look at this in simple eyes, it becomes obvious this essentially opts to close "without canceling outstanding requests" as we've coined it - since clearly we've first drained any outstanding request and THEN closed (with break;):

const iterator = getSomeIterableIterator();

(async () => {
  for await (const value of iterator) {
    console.log(value);
    if (hadEnoughValues()) {
      break;
    }
  }
})();

The bottom line in all that is the choice between closing with canceling or not canceling outstanding requests can be seen as purely a consumer choice, and how the source iterator is implemented isn't inherently concerned with it (apart from specifying how or if to actually early abort its stuff, whenever the consumer opts for this).

tc39 / proposal-async-iterator-helpers

Behavior of `.return()`ing, concurrency-wise #12