tc39 / proposal-async-iterator-helpers

Methods for working with async iterators in ECMAScript
https://tc39.es/proposal-async-iterator-helpers/
95 stars 3 forks source link

Error handling #5

Open bergus opened 1 year ago

bergus commented 1 year ago

Calling .next() multiple times without waiting for each other means we have multiple independent unresolved promises at the same time. For all the methods in this proposal, but especially for something like .buffered(), what are the guarantees that we want to make wrt errors? Do we want to fail-fast like Promise.all does? Do we want to collect errors into an AggregateError, like Promise.any does? How do we ensure that no unhandled promise rejections crash the process? What other things did I miss?

And what is the (concurrent) asynchronous iterator protocol for dealing with errors? Does one .next() call returning a promise that throws mean that all promises returned from subsequent calls a) must throw as well b) must immediately fulfill with {done: true, value: undefined} c) not be limited at all? Does this hold for promises that were created by previous .next() calls that are not yet resolved, for promises created from subsequent .next() calls (that are not yet resolved), or only for promises returned by .next() calls that happened after the rejection? (Touching on #3)

bakkot commented 1 year ago

If we stick to the consistency property in #3, I think it's pretty straightforward: errors get buffered like anything else, and don't get aggregated. If you get multiple errors from multiple calls, only the first one is surfaced, and the others (like all other results from calls after a rejection) get tossed into the void.

I'm neutral on whether those count as unhandled promise rejections, since I think that situation shouldn't come up for well-behaved iterators:

And what is the (concurrent) asynchronous iterator protocol for dealing with errors?

There's no guarantees in the spec right now, but the behavior of for await and yield * is suggestive. Promises from an async iterator are consumed in order, and once one rejects (or settles with { done: true }), you assume that the iterator has closed itself (including performing any cleanups it needs to do).

For iterators which behave like those in the standard library or the web platform or node, as well as those you get from async generators, once any call to .next has rejected or settled with { done: true }, all subsequent calls (including those made while waiting for the first one to settle) will return Promise.resolve({ done: true }). For such an iterator, you can never be in the situation of getting multiple rejections from multiple calls to .next().

For helpers to have that property, of course, the promises from a sequence of concurrent calls to .next will need to settle in order, as I've written in #3; that's true for most other async iterator as well. So I think in the helpers we will not even register a listener for a call to .next() until all the previous calls have settled. And if one call to .next() rejects (or returns {done: false }), then we'll just discard the promise we got from the later calls without registering any listeners, on the assumption that those would return { done: true }.

bergus commented 1 year ago

I'm neutral on whether those count as unhandled promise rejections, since I think that situation shouldn't come up for well-behaved iterators

Even if you have your iterator promises settle in order ("well-behaved"), that does not necessarily include those from callbacks. Let's take this example:

const iter1 = AsyncIterator.from([
  () => new Promise((resolve, reject) => { setTimeout(resolve, 1000, 'a'); }),
  () => new Promise((resolve, reject) => { setTimeout(reject,   500, 'b'); }),
  () => new Promise((resolve, reject) => {                    reject('c'); }),
  () => new Promise((resolve, reject) => { setTimeout(reject,  1500, 'd'); }),
]).map(fn => fn());
const iter2 = iter1.buffered(4);
try {
  await iter2.forEach(console.log);
} catch(err) {
  console.error(err);
}

What is the error that will be caught and when? I guess you'd say that it's b after 1000ms (after logging a), but that means there's unhandled promises (b and c) around for that time, and another one (d) later. Treating these as unhandled rejections and crashing the process would be absolutely unendurable.

Errors get buffered like anything else, and don't get aggregated. If you get multiple errors from multiple calls, only the first one is surfaced, and the others (like all other results from calls after a rejection) get tossed into the void.

I see. This matches the behaviour of Promise.all, but this is not always desirable. Could we change the protocol (and the map implementation) so that one can implement a version of .buffered() that ignores errors from buffered promises and yields remaining results, or collects them into an AggregateError?

bakkot commented 1 year ago

What is the error that will be caught and when? I guess you'd say that it's b after 1000ms (after logging a), but that means there's unhandled promises (b and c) around for that time, and another one (d) later. Treating these as unhandled rejections and crashing the process would be absolutely unendurable.

Yeah, that's a good point. My intention is for the third and fourth calls to .next to settle with { done: true }, so you still can't end up with multiple calls to .next() which reject, but those errors do still occur (and get tossed into the void). I agree those should not be treated as unhandled rejections.

Could we change the protocol (and the map implementation) so that one can implement a version of .buffered() that ignores errors from buffered promises and yields remaining results, or collects them into an AggregateError?

We could, in theory, but that would be a pretty big change to the protocol, and would probably require changes to for await of. Right now, if a for await of loop hits a rejected promise, it does not call .return on the iterator which yielded it - so the assumption is that an error indicates the end of the iterator just as much as returning { done: true } does, and, in particular, indicates that the iterator has done any cleanup it needs to do.

If we make it so that there's built-in facilities for reading an iterator past an error, that assumption is no longer valid. If you vended an iterator which could be read past an error, and someone used it in a for await of loop with the current semantics, your iterator would never get a chance to clean up. That seems bad. We could in principle change for await of to call .return even when it hit an error, but that might break existing stuff, and I'd be quite reluctant to do so.

I'd prefer to say that if you are creating a stream which can recover from errors, you just build it so that it vends { success: boolean, value } pairs and never actually throws an error. Then you can do things like "ignore errors" with a simple .filter; it all composes nicely. And the default assumption can continue to be that streams cannot recover from errors, as for await of assumes.