Closed Trzik closed 1 year ago
Yes that's reasonable explanation. I'm preparing a testcase now.
This can be reproduced with https://github.com/lhmouse/mcfgthread/blob/d283e3f095c5c4e6a4f1c2d9ae74415cc8df85e9/test/cond_multi_wait.c.
Thanks for the report.
I have a test with rigorous usage of cond vars that started deadlocking when I switched to mcfgthread implementation. With a little digging, I found that eventually one thread gets stuck in
_MCF_cond_wait
and the other in_MCF_cond_signal_some_slow
.The signaling thread goes into
__MCF_batch_release_common
, calls__MCF_keyed_event_signal
and ends up inNtReleaseKeyedEvent
with NULL (infinite) timeout.For the wait call, the thread gets stuck in this
__MCF_keyed_event_signal
call that ends up in the sameNtReleaseKeyedEvent
call with zero timeout. https://github.com/lhmouse/mcfgthread/blob/ff795e30999ebed91be6bb3bd9cab2ffebec3b61/mcfgthread/cond.c#L83I don't understand the code that well, but isn't this line a bug? Shouldn't the
__MCF_keyed_event_wait
be invoked instead? I did an experimental rebuild of the library with this change and it fixed the deadlock for me. But since I see similar line inevent.c
, this may be intentional in which case I don't know where the real root cause lies.