tc39 / proposal-iterator-helpers

Methods for working with iterators in ECMAScript
https://tc39.es/proposal-iterator-helpers
1.33k stars 35 forks source link

Making it based on generators makes it impossible to poll values in parallel for iterators that support it #128

Closed dead-claudia closed 3 years ago

dead-claudia commented 3 years ago

This came up in someone proposing an alternate formulation of my promise scheduling proposal. (H/T @Jonasdoubleyou for the suggestion that would require this)

Async generators enforce synchrony:

"use strict"

async function *emit() {
    console.log("emit a")
    await Promise.resolve()
    console.log("emit resolve a")
    yield "a"
    console.log("emit b")
    await Promise.resolve()
    console.log("emit resolve b")
    yield "b"
    console.log("emit c")
    await Promise.resolve()
    console.log("emit resolve c")
    yield "c"
    console.log("emit done")
}

const gen = emit()
console.log("next 1")
gen.next().then(v => console.log(`1: ${v.value}`))
console.log("next 2")
gen.next().then(v => console.log(`2: ${v.value}`))
console.log("next 3")
gen.next().then(v => console.log(`3: ${v.value}`))
console.log("next 4")
gen.next().then(v => console.log(`4: ${v.value}`))
console.log("done")

// Logs:
// next 1
// emit a
// next 2
// next 3
// next 4
// done
// emit resolve a
// emit b
// 1: a
// emit resolve b
// emit c
// 2: b
// emit resolve c
// emit done
// 3: c
// 4: undefined

However, some cases, like stateless .map and .filter transforms, could work with an async iterator that allows being read concurrently, and the spec doesn't exactly write that out as a possibility - it's just a natural part of the contract with how async iterables are enumerated.

The hypothetical operator here is AsyncIterator.prototype.parallel, and it would allow controlled parallel iteration of the async iterable, and this would come in handy with things like enumerating files in a glob stream, breaking once one of them reads a file. Think of it like this pseudocode:

AsyncIterator.from(glob("src/**/*.js"))
    .asyncMap(file => fs.readFile(file))
    .filter(file => file.includes("whatever"))
    .parallel(1000)
    .take(1)
    .toArray()

I'm personally neutral-negative on the idea, but I can see the value of doing it this way, and I feel that it should be addressed whether you all want to leave the door open to this or not.

rektide commented 3 years ago

It would be nice if filter indeed were not generator based & could stream results.

That said, I think your example is doable by hacking around the problem somewhat, and having asyncMap synchronously return the promise rather than await it to be resolved, & awaiting/filtering latter:

AsyncIterator.from(glob("src/**/*.js"))
    .asyncMap(file => fs.readFile(file))
    .parallel(1000) // perhaps parallel could do the resolving, which would simplify code below
    .filter(async (file) => (await file).includes("whatever"))
    .take(1)
    .toArray() // the file promise, unless we add another step to resolve

This idea of having asyncMap synchronously return the promise may need to be even uglier in some cases, to avoid the synchronization async generators introduce. If the above doesn't work, it might have to become:

AsyncIterator.from(glob("src/**/*.js"))
    .asyncMap(file => ({promise: fs.readFile(file)}))
    .parallel(1000)
    .filter(file => (await file.promise).includes("whatever"))
    .take(1)
    .toArray()

Gross. That would be unfortunate.

devsnek commented 3 years ago

this is definitely out of scope for this proposal (i wouldn't be against this kind of functionality existing in the language though)

rektide commented 3 years ago

@devsnek is there a reason we want to base this proposal on generators? it feels like it was convenience rather than cause. i don't see that a re-implementation off generators would be challenging.

it would be very nice to not have these iteration helpers be so constrained.

devsnek commented 3 years ago

the problem this issue is discussing is out of scope. the implementation of the methods in this proposal have nothing to do with that.

rektide commented 3 years ago

i don't understand your assertion that the the problem is out of scope. how so? iteration-helpers describes itself as:

A proposal for several interfaces that will help with general usage and consumption of iterators in ECMAScript.

surely we want to do a good job making sure these iteration helpers work in asynchronous use cases? why would we exclude asynchronous operations from iteration-helpers functioning?

edit: as i think this over some, i am becoming somewhat sympathetic to this being chalked up as an async-iteration issue, not a helpers issue. my second psuedocode example, of creating {promise} objects to by-pass the async-iteration limitations, is a reasonable demonstration of async-iteration's limitations.

trying to work with the constraints we have, perhaps we need to move parallelization out of the middle of the pipeline:

AsyncIterator.from(glob("src/**/*.js"))
    .parallelRaceMap(file => fs.readFile(file), 1000)
    .filter(file => file.includes("whatever"))
    .take(1)
    .toArray()
Jonasdoubleyou commented 3 years ago

As the one starting the original conversation I'd like to add that .parallel was just a "first idea", and although I still like how it would hypothetically chain, it pretty much collides with the spec (including this proposal). Yielding a Promise inside an async generator also already awaits the Promise, so I'd say doing the same in .map and .filter is reasonable (and the opposite would be very confusing). Thus I agree to @devsnek that this is "not an issue" for this proposal.

That said there are still plenty of options to introduce parallel processing into iterators, e.g. a parallelMap which I guess can be discussed in a followup proposal (or the one already drafted by @isiahmeadows ).