Open COM8 opened 2 months ago
The ThreadPool being stuck I encountered, It was never stopping, it seems like the issue has to do with the scope of the status_locks, if you notice the Stop() method:
int ThreadPool::Stop() {
std::unique_lock status_lock(status_wait_mutex); // <- this will lock the entire scope of Stop(), after notifying,
// the mutex is still held during the notification which will not give the threads a chance to acquire it and get the notification (in the thread loop)
if (STOP == status) {
return -1;
}
status = STOP;
status_wait_cond.notify_all();
task_cond.notify_all();
for (auto& i : threads) {
if (i.thread->joinable()) {
i.thread->join();
}
}
threads.clear();
cur_thread_num = 0;
idle_thread_num = 0;
return 0;
}
I think it should be:
int ThreadPool::Stop() {
{
// lock only this scope to ensure that the mutex is not held during the notification process, allowing the waiting threads to acquire it immediately after being awakened
std::unique_lock status_lock(status_wait_mutex);
if (STOP == status) {
return -1;
}
status = STOP;
}
status_wait_cond.notify_all();
task_cond.notify_all();
...
}
I tried that locally and it seems the thread pool is waiting and stopping normally so far, I will test with more use cases and get back to you if there is another issue.
Description
There are potential cases where a
cpr::ThreadPool
can get stuck in a potential deadlock. They are both caused by a race condition involvingtask_cond
.It can happen that a thread gets stuck waiting for
task_cond
to get notified but during this there is no one who can notify him like if the caller invokescpr::ThreadPool::Wait()
.Example/How to Reproduce
Run the currently disabled
ThreadPoolTests
from within https://github.com/libcpr/cpr/issues/1035 over and over again. From time to time they will get stuck.Possible Fix
No response
Where did you get it from?
GitHub (branch e.g. master)
Additional Context/Your Environment
Fedora 38, https://github.com/libcpr/cpr/issues/1035