Closed cvtsi2sd closed 4 years ago
It seems really a bug. I can reproduce this deadlock on my Windows 7 machine.
I have almost forgotten the details. Look at https://github.com/lhmouse/mcfgthread/wiki/Pseudo-Code#condition-variable: The two paths (spinning and non-spinning, following if(bSpinnable) {
) of condition variables look so different and I didn't think it worth implementing (if not optimizing), as in normal usecases condition variables should be protected by mutexes. Once flags also don't have this spinning loop. I think now we should remove it.
After the removal your program no longer deadlocks.
I presume the deadlock was caused by the fact that the signaling thread might run so quickly that it got and cleared the signal by itself (before the busy waiting thread could receive it), as the condition was not met it started another wait on the same condvar, while the waiting thread lost this signal, hence the two threads were blocking on the same condvar.
This renders the old implementation incorrect. It was invalid to unlock the mutex before bumping the trapped thread counter. And, thanks for the report, should work as expected now.
Code doing a ping-pong between two threads deadlocks when compiled with a mcfgthread-based cross-toolchain (64 bit, gcc 8.3.0-based, with the provided MCF patches applied) and run under wine >= 5.0 (5.0 on Ubuntu 20.04 x64, 5.9 on Arch Linux x64, plus various tests with Proton 5.0 using esync, again on Arch Linux x64). No problems when running under wine 3.0.
Here's the sample code: https://gist.github.com/cvtsi2sd/b9c534e6f447f35c781fc20deb3989b9
This also works fine (terminating with
) when:
When it deadlocks (= exiting only due to timeout), I get it locked up at random stages.
Interestingly, if I don't run it under
time
it seems to get further:Between wine 3.0 and wine 5.0 I noticed several changes in
sync.c
, some to supposedly fix #36, some reimplementing SRW stuff basing it on futexes; I think they may be related, but I'm wondering if it's their implementation that is broken (it may be, I saw several very recent bugfixes to that file regarding keyed events), or the change of underlying implementation actually exposed a bug in MCF.