project-iris / iris

Decentralized cloud messaging
iris.karalabe.com
Other
571 stars 32 forks source link

Worker pool - recursively adding items? #54

Closed artjomsimon closed 9 years ago

artjomsimon commented 9 years ago

Hi everyone,

I've played a bit with the worker pool, which works fine when I'm adding tasks to it which it can crunch on. However, while experimenting with a piece of C code using OpenMP and trying to see how it would look in Go, I understand that the way the pool is designed using a FIFO queue, there's a case where the pool deadlocks: Creating tasks recursively, while waiting for a result. What seems like an academic, contrieved example, actually is (an irregular algorithm where n tasks are created recursively for benchmarking purposes)

Here's an example: http://play.golang.org/p/eSeh1ijy1g (I've simply copy&pasted the work pool package into the play sandbox, with an example at the top, for the sake of demonstration)

Basically, the resursively called func is scheduled, blocking a place in the queue, waiting for a result that never comes, because the last recursion step that actually would unwind the chain of recursion, never gets scheduled, because the pool's queue is already full.

You can increase the pool size, of course, healing symptoms, only to run into the problem when the recursion is deeper than the size of the pool.

I understand that even if this kind of problem isn't that common, I thought this is a good place to ask, because I'm sure someone has already thought how this could be solved elegantly in Go and can point me in the right direction.

Thank you, Artjom

karalabe commented 9 years ago

Hey Artjom,

The thread pool in Iris was not designed to have running tasks inject further ones into the pool that they depend upon. My goal with the pool was to specifically put an upper limit on the concurrency of some independent tasks. So in its current incarnation, the Iris thread pool will not be too helpful to you.

Actually, the challenge that you need to solve would be a bit more intricate than task scheduling, since you'll need a way to yield running jobs and prioritize tasks. The question boils down to exactly how you would like to schedule the sub-tasks?

E.g. Task1 -> SubTask1, SubTask2, SubTask3 SubTask2 -> SubSubTask1, SubSubTask2

If you execute on a thread pool of size 2, Task1 will spawn ST1, ST2, and ST3 (first two would get scheduled), then ST1 finishes, but ST2 spawns 2 additional ones SST1, SST2. What do you schedule now? SST1 and SST2 or SST1 and ST3? Why?

The above choice of picking one or another scheduling will be problem dependent, and because of that I think you'd be better off with rolling your own specific thread pool and scheduler for your specific problem.

However, as a few suggested on the mailing list, I would first go with the simplest solution, and complicate if that doesn't suit you.

A. Can't it be simply done by spawning a goroutine for all tasks and subtasks? If not, why? B. Could it be done by having top level tasks scheduled by the pool, and all sub-tasks spawned in goroutines? If not, why? C. You'll need a prioritization logic to figure out what to run. Also you'll need to introduce yielding.

So I'd suggest very clearly defining the problem domain first (maybe you already have, it just didn't come through to me (and I'm not really experienced with OpenMP beyond the concepts)), and then deciding what complexity is necessary and what you can live without.

Cheers, Peter

PS: Since this discussion would be quite off-topic in this project's scope (as the Iris thread pool is not appropriate for your intended use case), I think it would be better to move it back into the gonuts mailing list where others could also pinch in with their ideas :) But I'd be happy to continue it there.