Should `.drop`'s dropped items be read in parallel?

bakkot commented 1 year ago

Consider:

asyncIteratorOfUrls
  .map(url => fetch(url))
  .drop(3)
  .next();

Should the first four fetches (the three dropped, and then one which will be returned) happen in parallel, or in sequence?

That is, should the first call to .next on the .drop(N) helper be like

let promises = [];
for (let i = 0; i < N; ++i) {
  promises.push(inner.next());
}
let actualNext = inner.next();
return Promise.all(promises).then(
  ps => ps.some(x => x.done) ? ({ done: true }) : actualNext,
  err => ({ done: true }),
);

or

try {
  for (let i = 0; i < N; ++i) {
    let { done } = await inner.next();
    if (done) return { done: true };
  }
} catch (e) {
  return { done: true };
}
return inner.next();

? (Some details of error handling omitted.)

I am inclined to say the former (they should be parallel), but that implies maybe we need a mechanism to for async functions to limit their degree of parallelism (like in this npm package).

benlesh commented 1 year ago

IMHO, It should be done in sequence as a matter of practicality. In my experience with RxJS, in-sequence async is a lot easier for people to reason about than parallel async, in particular when it comes to an async sequence of values. A lot of the landmines people step on with RxJS have to do with using parallel operations on a series of values and expecting them to always come back in a particular order. People are just really bad about understanding that.

AsyncIterable, in particular, lends itself VERY WELL to backpressure control because it mostly deals with async values in sequence, and I think that having it iterate a bunch of values and let them resolve in parallel kind of goes against that advantage it has.

Given that nearly everything else about how people would deal with AsyncIterable involves handling the values in sequence (for await for example), I would shy away from a parallel drop, as it will be harder for folks to understand.

If it can be explained like this it's easier to understand, IMO:

async function* drop(source: AsyncIterable, count: number) {
  let seen = 0
  for await (const value of source) {
    if (seen < count) {
      seen++;
    } else {
      yield value;
    }
  }
}

As a general rule, I'd recommend sticking to the simple, one-at-a-time semantic of AsyncIterable as read via for await, just to keep things more deterministic and lean harder on it's advantages for dealing with backpressure.

bakkot commented 1 year ago

@benlesh Yeah, simplicity has a lot going for it.

Maybe we could make concurrency here opt-in via a second parameter. Or maybe we could just let people do it themselves when they want concurrency, though getting the error handling right is a little tricky.

benlesh commented 1 year ago

Parameterized concurrency could be done in a follow up as well. FWIW, RxJS's parameterized concurrency is very rarely used. I know that's anecdotal, but it's a pretty big anecdote.

bakkot commented 1 year ago

@benlesh That's good feedback.

For some context, the reason this proposal is currently being held back is because we want to make sure there's room for adding consumer-driven concurrency at some point, which does necessarily affect the design now, not just in a followup. Consider the following snippet:

let x = asyncIteratorOfUrls
  .map(u => fetch(u));

await Promise.all([
  x.next(),
  x.next(),
]);

The idea is that if you go out of your way to call .next concurrently (as in this snippet), it should let you do work concurrently (in this case, have two simultaneous fetch calls running). See my initial presentation.

Importantly, if you're just doing a normal for of loop, this capability should be completely transparent to you (since it will never call next without waiting for the previous promise to settle). And we don't necessarily need to add any particular affordances for this as part of v1, though we'll probably have one. But the fundamental capability to be called concurrently does need to be designed in at the start, even if we don't except people to use it that much, since the above snippet will be possible to write as soon as async iterator helpers ship.

But yes, probably people who are just calling .drop aren't particularly intending to do so concurrently and don't need that. And we probably don't need to include any parameterized concurrency helpers (except bufferAhead) in the initial version.

tc39 / proposal-async-iterator-helpers

Should `.drop`'s dropped items be read in parallel? #1