xenia-project / xenia

Xbox 360 Emulator Research Project
https://xenia.jp
Other
8.26k stars 1.14k forks source link

[Linux, threading] WaitMultiple causing deadlock #1677

Open JoelLinn opened 4 years ago

JoelLinn commented 4 years ago

From PR #1317:

https://github.com/xenia-project/xenia/blob/d1f7ee35933b1eb0ba8a87c406538a8e81a30662/src/xenia/base/threading_posix.cc#L220-L223

@Triang3l :

This will probably cause a deadlock on the next thread doing any waiting if the thread is suspended between locking and waiting.

Suspension needs rework to only occur immediately when the thread is in guest code, but deferred until the return to the guest if it happens during a kernel call. Or, for performance reasons, possibly not on every guest–host boundary, but when a special kind of mutex, similar to global_critical_region but not limited to a single instance, is locked/unlocked.

This should be somewhat easy to implement on POSIX because the thread itself will reply to the suspension signal — it can set a flag in its context structure when it enters a kernel function, and the signal handler can check whether to sleep immediately or to defer the suspension.

On Windows, however (while there is a native WaitForMultipleObjects implementation that will work fine, having control over suspension would allow us to have more granular kernel mutexes rather than just a single global_critical_region which is our current way of preventing suspension when kernel state is being modified, with easier usage of condition variables in the kernel — would possibly be useful for GPU readback in the access violation handler), the target thread can't provide its own response to suspension, and there are no signals. A possible solution is making suspension synchronous (which is contrary to the way it actually works on Windows, but possibly should have no noticeable effect on anything), with the source thread, not the target, being partially responsible for checking whether the target is in kernel code: call SuspendThread, then GetThreadContext to synchronize with the suspended thread, and check whether it's sleeping in kernel code currently — if yes, notify it that suspension was requested and unsuspend it on the host, to let it self-suspend on the next kernel to guest return.

Triang3l commented 3 years ago

The rules of suspension probably apply to termination also — on Windows we don't seem to care either.

JoelLinn commented 2 years ago

Edit: I addressed this in #1992

Posting this in here since the ticket is most relevant to the region and I don't want to create another. So as established in #1678, the condition is global and a completely unrelated wait might be signalled.

So what if our conditions are untouched by the initiating signal that woke us? How does another thread that really waits on the source signal get the message? I think this is the main issue here.

https://github.com/xenia-project/xenia/blob/6b45cf84472c26436d4a5db61a7b50dab301e398/src/xenia/base/threading_posix.cc#L278-L301

I think we need to maintain a list of waits and ~(option A) loop through it on every new signal (under the lock protection), checking if the other (if any) objects are also signalled and select the first matching wait list. Or~ (option B) notify all threads and have them check individually.