Current status as of 2024-06

bakkot commented 2 months ago

I'll be presenting on the current status of the proposal to the committee next week, and figured I'd post here so people watching this repo can follow along. Here's my slides.

Specifically, my current plan is to include the bare minimum in this, and explore the space of unordered helpers (#7, #20) in a followup. That is, I'm planning to include

order-preserving .map, .filter, .flatMap, .toAsync, all supporting concurrent calls to their .next.
.buffered(N) for doing such calls.
All the other helpers from sync Iterator, with no additional affordances for concurrency (except maybe .drop, #1).

Notably I am not planning to include any helpers for doing concurrent forEach (and friends), nor any helpers for doing out-of-order concurrency. The latter means you will not be able to get optimal concurrency for all problems just using the things in this proposal, but it will still be better than nothing, and I'm intending for the design to leave room for multiple possible directions for unordered helpers (e.g., .map is allowed to resolve promises out-of-order, which leaves room for utilities like .bufferUnordered).

If the committee approves of this "minimal viable helpers" direction, I'll spec it and propose for advancement as soon as I can (though the next two months will be hectic for me so it may not happen at the immediately following meeting). And then hopefully I or other committee members will pursue other things (concurrent forEach, out-of-order concurrency, merge, etc) as followups.

cc @conartist6 @laverdet as people who have been particularly involved in discussions lately.

conartist6 commented 2 months ago

Thanks for tagging me!

I still have one showstopper-level concern, which is that this takes us from 1 set of helpers to to 2, but with the expectation that out-of-order conccurrency could add a third set of helpers. If this is what is done, adding support for stream iterables (semi-synchronous iterables) would potentially take the number of needed clones of the helper API from 3 to 4 or 5, depending on whether semi-sync iteration should support both in-order and out-of-order concurrency.

bakkot commented 2 months ago

I'm basically fine with that. We already have these helpers on Array, so it's really more 2->3 than 1->2; from the user's perspective it's not really a new thing to learn, except for any details around concurrency.

And no matter what happens I think we will definitely want this particular set of helpers, with this placement and semantics, so I don't think we need to settle on what we're doing with regards to out-of-order concurrency in order to add these unless we think that out-of-order concurrency would be included in these helpers rather than being their own functions, which I do not.

conartist6 commented 2 months ago

I'm not talking about out-of-order concurrency, which I agree doesn't need to be settled to add these. My semi-sync thing does need to be settled I think because it would affect how the in-order helpers should be implemented.

conartist6 commented 2 months ago

Defensive awaiting is expensive in a microbenchmark sense, but unlike other microbenchmark costs it does not tend to fade to the background as you add real levels of complexity to a system -- in fact instead of fading away the cost compounds continuously, accounting for an ever greater fraction of the overhead of a real system.

Not allowing out-of-order promise resolution costs wall clock time but not processor power. Defensive awaiting is a huge waste of processing power, more akin to making a big pile of fossil fuels and setting fire to it for no reason.

bakkot commented 2 months ago

The current plan is that we will await any non-primitive value returned by the callback given to these helpers. I could see a case for extending the exception to include objects which lack a callable then.

I'm don't think we could reasonably skip the await of the values returned by next methods of the underlying iterator, which I believe is what you mean by "semi-sync", and in any case the values returned by the next methods of the helpers must be Promises since consumers of the async iterator protocol can reasonably be relying on that, so there would still be plenty of awaits around. The async iterator protocol is simply not suitable for cases where you care about the number of awaits.

Anyway, if you want to open an issue suggesting an alternative semantics, go for it.

conartist6 commented 2 months ago

If you have a different underlying iteration protocol backwards compat is preserved, allowing you to also skip the await on values returned by next. A consumer will only create an iterator of that protocol if it knows how to handle it.

bakkot commented 2 months ago

Sure, but these methods are going on AsyncIterator.prototype. Async iteration is an existing protocol.

conartist6 commented 2 months ago

Ah, that's true, eh.

laverdet commented 2 months ago

Good luck! This proposal is deceivingly challenging.

I still think that buffered is a flawed primitive in that it conflates concurrency of a mapping function with concurrency of the underlying iterable. I believe them to be separate policies.

Punting on concurrent fold operations means that we will still need third party tools for many tasks. I worry that we will end up with multiple conflicting concurrency models. The language will take the buffered approach which is insufficient to describe fold operations.

Personally, my preference is that the specification is dead simple and super predictable:

async function *map(iterable, callback) {
    let index = 0;
    for await (const value of iterable) {
        yield callback(value, index++);
    }
}

As always, thanks for your work in exploring this proposal and including us along the way.

bakkot commented 2 months ago

I definitely agree there's still much more to do in this space, and that buffered is not sufficient for many problems. I don't think that means we should artificially restrict these helpers from being consumed concurrently, though.

MatAtBread commented 2 months ago

It's probably too late to pitch in, bit (co-incidentally), I have implemented quite a few of these in a real system here.

They're (probably) not 100% spec compliant, but they have a couple features I'd like to point out, as they have real-world benefits in terms of resource usage.

The first is that the "middleware" functions like .map and .filter are lazy - they do not count as consumers until they themselves have a consumer. This means that generators and the like don't get pulled until there is an ultimate consumer. This makes a big difference in my module (a UI framework), since mapping an async iterator does nothing, until that iterator is itself consumed by a call to .next() (or for await).

The second is the introduction of the symbol Iterator.Ignore which if returned from the map function suppresses the subsequent yield. In fact, in my implementation, the underlying function is filterMap, that can both map AND filter, by conditionally returning Iterator.Ignore.

The third is the provision of the previous yielded value (or Ignore on the first iteration) to the map function, making it easy to implement a .unique(), since the mapper can compare the current and previous value (or members thereof).

I'm aware I'm commenting very late in the process, and I'm not a spec-writer, but the laziness and generalised filterMap described above simplified and centralised a lot of similar code. I did initially start with the obvious implementations as described above, but found the cost of pulling from the source with no consumer of the mapped iterator surprisingly high in my code base. Once I'd addressed that in map doing so in filter, unique, etc. proved irritatingly repetitive and a generalisation made sense from a complexity PoV.

bakkot commented 2 months ago

These functions will be lazy, yes. There's not really any other way for them to be. You can compare them to the sync versions, which are already shipping in Chrome:

let iter = [0, 1, 2].values().map(x => { console.log('value: ' + x); return x + 1; }); // does nothing
// later...
iter.toArray(); // prints `value: 0`, `value: 1, `value:2`; returns `[1, 2, 3]`

The other ideas are interesting things to do in a library but, I think, probably not in the standard library.

In case you haven't run into this fact before, you may be interested to know that flatMap is a generalization of both map and filter and also strictly more powerful, in that it can map to more than 1 element. filterMap is a restricted special case of flatMap.

MatAtBread commented 2 months ago

Thanks for the reply - much appreciated. Is there a current model implementation I can take a look at? I'd prefer to move to standardised functions as far as possible.

bakkot commented 2 months ago

Not for async iterators, alas. If the committee approves of the direction in the OP I'll produce a polyfill and also get spec text ready as soon as I have time, though that may not be for the next few months.

For sync iterators, there's the native implementation in Chrome, and polyfills in es-iterator-helpers and core-js.

laverdet commented 2 months ago

@MatAtBread to echo what @bakkot said, laziness is table stakes for iterators.

Your implementation of filterMap is a good. A few notes:

Like Kevin said, flatMap is a superset
Line 632 and line 642 -- these cases should always invoke return on the underlying iterator. throw, in general, is used only for fancy generator tricks like gensync. Said another way: the mapping function throwing is not a concern of the consumed iterator.
Line 653 you should not define throw. See "3.1.1.2.2 Iterator.from" of the iterator helpers proposal.

Otherwise I think the implementation is sound.

MatAtBread commented 2 months ago

Thanks for the feedback. I'll take a look at the details tomorrow and try and make sense of it all.

I'm not really familiar with the intricacies of flatMap, but it sounds like I should take a closer look

ben-laird commented 2 months ago

Notably I am not planning to include any helpers for doing concurrent forEach (and friends), nor any helpers for doing out-of-order concurrency. The latter means you will not be able to get optimal concurrency for all problems just using the things in this proposal, but it will still be better than nothing, and I'm intending for the design to leave room for multiple possible directions for unordered helpers (e.g., .map is allowed to resolve promises out-of-order, which leaves room for utilities like .bufferUnordered).

I very much agree with the lack of out-of-order concurrency methods, especially to make this proposal move as fast as possible. I'd go one step further and say they warrant their own proposal; I probably did not state that purpose very well in my opening comment on #20, nor did I explain the reason as well as you did here.

Regardless, I agree on leaving room for concurrency to get it right. Even looking at the slides, I saw some methods to achieve concurrency I had never even thought of before; allowing time to let those steep while working on the minimal set of async iterator methods right now I agree is the best plan. Thanks for making this issue to get everyone on the same page, this was very helpful for me.

bakkot commented 2 months ago

Update from plenary: committee agrees with this direction. I just ("just") need to specify it, and write a polyfill and some tests for experimentation purposes. Notes will be published sometime in the coming weeks with more details.

I'll probably also include zip, since that proposal is progressing and has an obvious analogy here. Like map, it will support concurrent calls to .next in the obvious way, i.e., calling .next on the zipped result will immediately trigger calls to the .next of all of the underlying iterators, wait for them all to settle as in Promise.all, and then resolve with that value. Also like map, its promises will be able to settle out of order (though unlike map, here that can happen only if that's true of at least one of the underlying iterators).

tc39 / proposal-async-iterator-helpers

Current status as of 2024-06 #22