ssec-jhu / dplutils

Distributed(Data) Pipeline Uitilities
BSD 3-Clause "New" or "Revised" License
1 stars 0 forks source link

stream executor back pressure #65

Open amitschang opened 5 months ago

amitschang commented 5 months ago

There could be a case where upstream tasks generate faster than downstream can consume and end up just filling up queues - for example if downstream processes need special hardware and we have extra CPUs, the system will go ahead piling on upstreams. This is not itself a big problem, but could be if we end up filling object store

A simple back pressure could be like: stop if size(downstream_queue) > X * total_tasks_runnable / batch_size where total_tasks_runnable is the number that could fit on the current cluster if it were empty and X is some scaling factor.