Consume input lazily OR allow "querying" of TaskGroup

saurabhnanda commented 4 years ago

While this library helps in ensuring that only a limited/pre-defined number of actions are evaluated in parallel, it still has one problem (especially with very large input data-sets). If the input data-set has N=2,000,000, this is going to create 2,000,000 asyncs, although 99% of them might not be getting concurrently evaluated. This still results in linear memory growth.

Even the most "lazy" function I could find, i.e. scatterFoldMapM, is only lazy wrt the output (i.e. it doesn't try to collect ALL the output). However, if I'm not mistaken, even this function will create all async immediately, even if it is not possible to run them concurrently.

Therefore, the title of this issue. I believe this can be handled in two possible ways:

Having a new function with the following type signature, which consumes the input lazily (is this another continuation? I'm not sure!):

someFunc :: (MonadIO m, Monoid b) 
          => TaskGroup 
          -> m (IO a)                          -- ^ producer of monadic actions
          -> (Either SomeException a -> m b)   -- ^ consumer of results
          -> m b

Allowing one to query the TaskGroup to see how many slots are vacant. This allows one to write complex scheduling logic for when to push a task.
```
 vacantSlots :: TaskGroup -> Int
```

jwiegley commented 4 years ago

I like your thinking here. Eagerly evaluating the requested tasks into async jobs before they get scheduled does sound like an unfortunate choice.

saurabhnanda commented 4 years ago

Should I attempt a PR?

On Wed, 22 Jan 2020, 03:51 John Wiegley, notifications@github.com wrote:

I like your thinking here. Eagerly evaluating the requested tasks into async jobs before they get scheduled does sound like an unfortunate choice.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jwiegley/async-pool/issues/20?email_source=notifications&email_token=AAAG5UJNERA7UD4KWUHZOMTQ65YQBA5CNFSM4KJ3DOS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJRP3AY#issuecomment-576912771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAG5UONG7RZHIDRGRCXGHLQ65YQBANCNFSM4KJ3DOSQ .

jwiegley commented 4 years ago

@saurabhnanda I'd be quite interested to see what you come up with, sure!

l29ah commented 1 year ago

Check out the proposed solution, could be also used for scatterFoldMapM.

jwiegley / async-pool

Consume input lazily OR allow "querying" of TaskGroup #20