Open Sharpie opened 3 years ago
Another aspect to this is that SSHD defaults to allowing a maximum of 10 starting SSH connections:
https://man7.org/linux/man-pages/man5/sshd_config.5.html
So, even if the task is fast or a semaphore is pushed down into the task, the SSH daemon will fail tasks if parallelize()
tries to start everything at once.
This issue has not had activity for 60 days and will be marked as stale. If this issue continues to have no activity for 7 days, it will be closed.
This is still important. As it stands, parallelize()
offers concurrency, but not efficiency as the runtime of a group of tasks is pegged to its slowest execution.
This issue has not had activity for 60 days and will be marked as stale. If this issue continues to have no activity for 7 days, it will be closed.
This issue is stale and has been closed. If you believe this is in error, or would like the Bolt team to reconsider it, please reopen the issue.
meh, why was this closed? I think this still isn't resolved? Could this be reopened please?
The stale issue bot closed it (several times). I have disabled that bot, this will remain open until we triage it. Thanks.
Could this issue please be re-opened?
Use Case
Given a large list of inputs that must be processed on a remote node, the
parallelize()
function could be used withrun_task
to effect parallel processing:However, if the processing step is resource intensive, there needs to be a way to control how many tasks are dispatched in parallel. This could be done by wrapping the call to
parallelize
inslice
to chunk the inputs up:However, this approach is extremely inefficient if the processing time of each item has a large variance. For a hypothetical task that could finish in tens of seconds or tens of minutes, the
slice
approach would leave a large percentage of processing capacity idle for each chunk that contains an outlier --- this could add hours to the total plan runtime.Describe the Solution You Would Like
A way to limit the concurrency of a
parallelize
block such that it processes items as quickly as possible, but no more thann
at a time. This could be an optional parameter to theparallelize()
function.