orc-lang / orc

Orc programming language implementation
https://orc.csres.utexas.edu/
BSD 3-Clause "New" or "Revised" License
41 stars 3 forks source link

Replace ThreadPoolScheduler with WorkStealingScheduler #177

Closed arthurp closed 7 years ago

arthurp commented 7 years ago

The ThreadPoolScheduler uses a single centralized queue which is a fatal flaw for our use case. The work units are small enough that the queue handling (and contention on the queue) is a major performance bottleneck. It can result in only 50% usage on an 8-core computer.

By switching to a work-stealing system (as in WorkStealingScheduler) we avoid a centralized queue and only communicate between threads when a thread runs out of work. This dramatically reduces the amount of contention on threads.

The current WorkStealingScheduler implementation is not complete so it cannot simply be swapped in. The main missing piece is blocked thread detection, but other features may also be incomplete. We should also look into whether the ThreadPoolScheduler approach to handling blocking is a good one. It may be better for threads to announce when they may block (maybe not announce but set a thread local flag). Then detecting blocking will be much easier.

arthurp commented 7 years ago

Java's ForkJoinPool (the work stealing implementation WorkStealingScheduler uses) provides very few hooks for external control and supports a number of features we do not need. It is clearly optimized for fork-join style work loads which almost never block. It may be better to look for another work stealing implementation which is better designed for our needs. Specifically that we need to be able to cheaply "maybe block" since nearly every call into java may block. Site metadata can help somewhat with this, but ForkJoinPool blocking is extremely heavy weight so it may still be too expensive.

arthurp commented 7 years ago

Fixed in 8d0df0e01.