project-iris / iris

Decentralized cloud messaging
iris.karalabe.com
Other
571 stars 32 forks source link

bugfix: ThreadPool.Terminate() does not wait for threads to finish #4

Closed abursavich closed 10 years ago

abursavich commented 10 years ago

Your project looks very interesting so I thought I'd grab a bug and help out :)

This should be a drop-in fix/replacement. All the previous tests pass and I added some extras. I did have to comment out a few lines from the previous test because it was accessing the (private) task queue which isn't accessible anymore (even from within the package).

I added some extra stuff in there too to prevent bad things from happening if Start or Terminate are called multiple times.

It's getting late here, so I'm gonna go crash but I can provide some other details later if you'd like.

karalabe commented 10 years ago

Cool, thanks :D

I'll try and look through it today, I just have to figure out how to match up my gitflow workflow with github (possibly by dumping gitflow for now, it complicates things more than it helps).

I'll get back to this asap :)

karalabe commented 10 years ago

I skimmed through your pull request. It's quite a heavy refactor, and introduces a few side effects that I tried hard to remove (i.e. my original implementations were very similar and I factored them out one by one). Most notably, it reintroduces a lot of context switches between the schedulers and workers.

In your proposed implementation scheduling an event requires one more context switch (scheduler -> looper -> worker) instead of the current single switch (scheduler -> worker). The problem is that Iris is all about throughput, so adding an additional go routine pass-through for every message would - imho - have a significant hit on the processing.

The second context switch that you've reintroduced is that a new go routine is spawned for every task, whereas in my current implementation workers only die if the task queue becomes empty. But when a burst of tasks are processed, there is no point in killing an re-spawning the workers, if there's work to do anyway. Go routines may be cheap, but nothing beats zero cost :).

Also my worker included an escape clause if the task crashes, whereas yours currently crashes the whole app :) No biggie, easy to fix, just something you accidentally removed.

Lastly, although all the pool tests pass, some higher level iris tests reach a deadlock. I'm guessing this is caused by you correctly fixing the termination, and my incorrect use of the pool somewhere :).

My proposal would be to break this quite complex pull request into multiple smaller ones and iteratively fix the encountered issues (tests/fixes, features and refactors should be separate).

All in all I think your solution looks great (it is much more Go-like), but I'm afraid of the performance implications and the produced deadlock will need to be investigated. Would you be willing to go through these iterative hoops? :)

abursavich commented 10 years ago

I haven't had a chance to check out the deadlock issue in the larger context, but I took another pass at keeping the original design mostly intact.

It might be a few days before I can come back and break it up into smaller iterative chunks, so I figured I'd push what I've got now in case you'd like a peek before then.

karalabe commented 10 years ago

This looks really great! :) Also very nice catch with the race condition requiring the respawn hack.

I think this pull request is perfect as is, no need for further splits. The reason I haven't merged it yet is because of the deadlock bug, now definitely in my code. I'll get right to tracking it down and will commit the two simultaneously (don't want to corrupt the master branch).

abursavich commented 10 years ago

Cool. I'm on my phone right now but I was thinking it might be better to change the Signal to a Broadcast. Just in case Terminate is called multiple times and more than one is waiting. With Signal the extra Terminates will deadlock. On Feb 15, 2014 1:45 PM, "Péter Szilágyi" notifications@github.com wrote:

Merged #4 https://github.com/karalabe/iris/pull/4.

Reply to this email directly or view it on GitHubhttps://github.com/karalabe/iris/pull/4 .