Multiple games miss wakeups from PulseEvent (Crysis (1) is very slow to play intro videos, Sonic Generations and Pro Evolution Soccer have audio distortion)

Tk-Glitch commented 6 years ago

With esync enabled, Crysis seemingly hangs on a black screen. Sometimes, after a while, the first frame of the opening video will show. 30-60 seconds later the second frame may or may not appear, ultimately ending up in what looks like a freeze.

Here's a log : https://drive.google.com/file/d/1vPyCKEb-FwH9otxYMFJWguwj64z9RPPq/view?usp=sharing

zfigura commented 6 years ago

Fixed by a7faa850aea0470e503bd8c9e45c8de6b2e5fe6c.

pingubot commented 6 years ago

sadly not fixed for me, using the 32 bit crysis executable.

zfigura commented 6 years ago

This isn't fixed (and I'm not sure why we thought it was). I can reproduce it even after a7faa85.

The game seems to use PulseEvent() as a makeshift timer or semaphore; not quite sure. In esync this is implemented as a write() + read(). I think what's happening here is that both of these go through to the kernel before the polling thread has time to return success (i.e. it wakes up the polling thread, but the count is 0 by the time we get to eventfd_poll()). But this is a guess, since I don't understand very well how the kernel-side code actually works, and unfortunately I can't reproduce the bug with strace to rule out that possibility.

In any case, if that's what's going wrong, this is a WONTFIX, and the correct solution is "just disable esync". It would kind of surprise me if the game benefits greatly from it anyway.

Tk-Glitch commented 6 years ago

I'm taking back my statement. It's indeed not fixed. I'm not sure what happened during testing yesterday as I can reproduce the issue with supposedly the exact same wine version today (even though it's been rebuilt in the meantime).

zfigura commented 6 years ago

It does indeed look like this is a kernel insufficiency: https://gist.github.com/zfigura/6f55688732728d7e1e452188014ec523

On my system the thread never wakes up from poll(), unless I add some sort of delay between the write() and read() calls.

Tk-Glitch commented 6 years ago

Ok that's interesting.. That also explains my results (was running an experimental kernel at the time). I'll try to use the exact same configuration next time :D

zfigura commented 6 years ago

For what it's worth, I think all of these games are going through timeSetEvent(). Not sure that this helps us, though.

Hello71 commented 5 years ago

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data. However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

zfigura commented 5 years ago

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data.

I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.

However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex.

Hello71 commented 5 years ago

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data.

I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.

One eventfd per waiter process per handle.

However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex.

Oh, you didn't specify auto-reset. Then just set EFD_SEMAPHORE and use eventfd in the regular way? i.e. write the data, then instead of immediately reading it, let the client consume it?

zfigura commented 5 years ago

I think the only way to correctly implement PulseEvent is to have one eventfd per client, and making it the client's responsibility to read the data.

I'm not sure what's meant by this. One eventfd per process? Per handle? That can't work, we need events to be able to be woken up from different processes and handles.

One eventfd per waiter process per handle.

How would that work, then? How would you wake up handle A by signaling handle B to the same object, if they are using different eventfds?

However, if that is too hard, I think you can bodge it by making it the client's responsibility to read the data, even if there are multiple clients. If you make the eventfd non-blocking and ignore EAGAIN, I reckon it'll work much better than the existing system. Even better, it'll work perfectly if there happens to be only one waiter.

Waking up multiple waiters on an auto-reset event seems extremely risky. Programs could easily be using them as a stupid semaphore or mutex.

Oh, you didn't specify auto-reset. Then just set EFD_SEMAPHORE and use eventfd in the regular way? i.e. write the data, then instead of immediately reading it, let the client consume it?

To have PulseEvent() behave like SetEvent()? That seems too risky.

Hello71 commented 5 years ago

How would that work, then? How would you wake up handle A by signaling handle B to the same object, if they are using different eventfds?

Yeah, that doesn't actually fix PulseEvent at all, my mistake.

To have PulseEvent() behave like SetEvent()? That seems too risky.

Ehm... I thought too hard about this issue and mixed up PulseEvent with SetEvent. I think it is possible to implement if there is only WaitForSingleObject. First, define an atomic counter on each auto-reset esync primitive, call it W.

PulseEvent:

if E is an auto-reset event and E.W > 0, then write(E, 1).
if E is a manual-reset event, then write(E, 1).

WaitForSingleObject:

if E is a manual-reset event, do as before. otherwise:
increment E.W.
poll on E.
decrement E.W.
non-blocking read(E). if successful, return. else, go to 2.

It is possible to lose PulseEvent if there are multiple waiters and PulseEvent is used in quick succession. I think it is probably possible to fix this by checking the value of W, or adding another counter.

I tried to think about it but couldn't find any way to make mixed WaitForMultipleObjects work; I think you need two atomic counters, or possibly to lock the event with a mutex (maybe only if PulseEvent is in use)? Is it common to use WaitForMultipleObjects with PulseEvent?

zfigura commented 5 years ago

Your idea is essentially to signal the event if and only if there are waiters, and not to unsignal it in that case. This could work, but would be difficult to synchronize, and it would break with wait-all. I'll have to give it some thought.

Hello71 commented 5 years ago

I think if it's possible to get it to work with wait-multiple, the same tactic should apply with wait-all. Personally, I don't really care about this issue, but if you say it's blocking upstreaming then I'm interested in helping.

zfigura commented 5 years ago

PulseEvent() is tricky, but the biggest problem with upstreaming is, I think, wait-all. That's not something that's exposed by Linux polling APIs, and I think it would probably need either kernel support or excessive locking on the Wine side. The other problem is use of shared memory (reading object state, reading object state atomically with releasing a semaphore).

Hello71 commented 5 years ago

I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases. Shared memory is also highly portable, even Windows has it. I don't understand what you mean by "volatile state of semaphores and mutexes", but if you need a shared memory across all processes on the same WINEPREFIX, just open a shared file in that prefix?

zfigura commented 5 years ago

I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases.

I'm not sure about ordering, but I guess the main problem is that we can grab objects incorrectly, and then they're unsignaled for a period of time until we release them.

Shared memory is also highly portable, even Windows has it. I don't understand what you mean by "volatile state of semaphores and mutexes", but if you need a shared memory across all processes on the same WINEPREFIX, just open a shared file in that prefix?

As in the owner thread, recursion count, etc. The problem is not portability but rather safety; we can't let one process corrupt the state of another process's objects, even by accident.

I think the proper way forward is to get support for these things in the kernel. It just needs a lot of work.

Hello71 commented 5 years ago

I don't see the problem with the existing wait-all algorithm, except that it potentially breaks ordering in some specific cases.

I'm not sure about ordering, but I guess the main problem is that we can grab objects incorrectly, and then they're unsignaled for a period of time until we release them.

I thought about this and I think you want a RW lock, where W is wait-all and R is clearing a single event (whether by WaitForSingleObject or single WaitForMultipleObjects). So it would look like:

wait-any:

poll
take rdlock
do the needful
release rdlock

wait-all:

poll
take wrlock
poll again with timeout=0. if not all readable, release wrlock and go to 1.
do the needful
release wrlock

reset:

take rdlock
do the needful
release rdlock

I think this is probably the most efficient possible implementation. I'm pretty sure it's not possible to implement in a completely lock-free manner. I tried reading the ReactOS code, but I didn't really understand it.

aufkrawall commented 5 years ago

Just stumbled over this. I guess it can't hurt to report that this is still an issue with both esync and fsync, Crysis 1 demo freezes after starting (wine-staging-tkg ff10ae6e74a8f090f89a217e0ff6da862b6b022b ).

zfigura / wine

Multiple games miss wakeups from PulseEvent (Crysis (1) is very slow to play intro videos, Sonic Generations and Pro Evolution Soccer have audio distortion) #10