twitter / summingbird

Streaming MapReduce with Scalding and Storm
https://twitter.com/summingbird
Apache License 2.0
2.14k stars 267 forks source link

Fix outstanding futures clear bottleneck #663

Closed pankajroark closed 8 years ago

pankajroark commented 8 years ago

When maxWaitingFutures is high, AsyncBase can end up spending a lot of compute just clearing finished futures. The cost of clearing the finished futures is proportional to total number of outstanding futures. If the number of finished futures is very low compared to total outstanding then we can end up paying this price for nothing much. And the same pattern can repeat on every execute. I found this happening in one of the jobs. In this change we keep track of number of pending futures and use that info to make sure that we clear the finished futures only when they are a good proportion of total number of outstanding futures so that we get good return on investment.

johnynek commented 8 years ago

couple of small comments.

pankajroark commented 8 years ago

@johnynek I incorporated comments, could you take a look again? Thanks