Open saurabhnanda opened 4 years ago
I like your thinking here. Eagerly evaluating the requested tasks into async jobs before they get scheduled does sound like an unfortunate choice.
Should I attempt a PR?
On Wed, 22 Jan 2020, 03:51 John Wiegley, notifications@github.com wrote:
I like your thinking here. Eagerly evaluating the requested tasks into async jobs before they get scheduled does sound like an unfortunate choice.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jwiegley/async-pool/issues/20?email_source=notifications&email_token=AAAG5UJNERA7UD4KWUHZOMTQ65YQBA5CNFSM4KJ3DOS2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJRP3AY#issuecomment-576912771, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAG5UONG7RZHIDRGRCXGHLQ65YQBANCNFSM4KJ3DOSQ .
@saurabhnanda I'd be quite interested to see what you come up with, sure!
Check out the proposed solution, could be also used for scatterFoldMapM
.
While this library helps in ensuring that only a limited/pre-defined number of actions are evaluated in parallel, it still has one problem (especially with very large input data-sets). If the input data-set has N=2,000,000, this is going to create 2,000,000 asyncs, although 99% of them might not be getting concurrently evaluated. This still results in linear memory growth.
Even the most "lazy" function I could find, i.e.
scatterFoldMapM
, is only lazy wrt the output (i.e. it doesn't try to collect ALL the output). However, if I'm not mistaken, even this function will create all async immediately, even if it is not possible to run them concurrently.Therefore, the title of this issue. I believe this can be handled in two possible ways:
Having a new function with the following type signature, which consumes the input lazily (is this another continuation? I'm not sure!):
Allowing one to query the TaskGroup to see how many slots are vacant. This allows one to write complex scheduling logic for when to push a task.