progschj / ThreadPool

A simple C++11 Thread Pool implementation
zlib License
7.63k stars 2.21k forks source link

Race condition: ThreadPool deadlocks on destruction #72

Closed RafalSzefler closed 4 years ago

RafalSzefler commented 4 years ago

Have a look at the worker loop:

https://github.com/progschj/ThreadPool/blob/9a42ec1329f259a5f4881a291db1dcb8f2ad9040/ThreadPool.h#L41-L50

and the destructor of the thread pool:

https://github.com/progschj/ThreadPool/blob/9a42ec1329f259a5f4881a291db1dcb8f2ad9040/ThreadPool.h#L87-L96

There's a race condition. Let's say that 4 worker threads wait on the lock while destructor fires. The destructor unlocks the lock, and calls condition.notify_all(); but if not all workers managed to reach this->condition.wait(...) then not all will be notified. After some time they finally reach this->condition.wait(...) and wait forever and the destructor hangs forever on worker.join();. Unless ThreadPool::enqueue is fired in parallel (forcing notification) but running methods while object is being destroyed is Undefined Behaviour.

fogti commented 4 years ago

"but running methods while object is being destroyed is Undefined Behaviour.", ok, then, what's the problem

RafalSzefler commented 4 years ago

@zserik the problem I experience with the code is a deadlock in the destructor on join(). And this is because one of threads deadlocks on wait().

But after further investigation the code seems to be theoretically correct. This happens on some old Android. I'm starting to think that there's some bug in the kernel, not sure.

fogti commented 4 years ago

ok.

this->condition.wait(lock, [this]{ return this->stop || !this->tasks.empty(); });

This line (from the worker loop) should prevent deadlocks, because it checks the condition before going to wait, and after each wakeup.