ptitSeb / box86

Box86 - Linux Userspace x86 Emulator with a twist, targeted at ARM Linux devices
https://box86.org
MIT License
3.33k stars 228 forks source link

Performance regression in Starcraft Broodwar 1.16 with cnc-ddraw (with bisect) #823

Open MamiyaOtaru opened 1 year ago

MamiyaOtaru commented 1 year ago

Broodwar using the cnc-ddraw dll from https://github.com/FunkyFr3sh/cnc-ddraw/releases (release 5.0.0.0). Opengl backend. DLL override: WINEDLLOVERRIDES=ddraw=n,b.

This configuration ran a lot smoother than without cnc-ddraw. Wine without it (set to use gdi or opengl) was just not as smooth. But now cnc-ddraw runs with a terrible framerate, since the commit below. Looks like a complex one :-/

632a01dbe6e25cc89455df37ecb73cc2e2f2a4ea is the first bad commit commit 632a01dbe6e25cc89455df37ecb73cc2e2f2a4ea Author: ptitSeb sebastien.chev@gmail.com Date: Sat Feb 11 17:19:05 2023 +0100

[DYNAREC] Use custom mutex, and improved atomic handling

src/box86context.c | 37 ++++++++++++++++++--------- src/custommem.c | 47 ++++++++++++++++++++++++----------- src/dynarec/arm_lock_helper.S | 51 ++++++++++++++++++++++++------------- src/dynarec/arm_lock_helper.h | 8 +++++- src/dynarec/dynablock.c | 31 ++++++++++++----------- src/dynarec/dynarec_arm.c | 6 ++++- src/emu/x86int3.c | 10 ++++---- src/emu/x86run_private.c | 4 +-- src/emu/x86tls.c | 10 ++++---- src/include/box86context.h | 32 +++++++++++++++++++----- src/include/debug.h | 2 +- src/include/threads.h | 2 ++ src/libtools/signals.c | 2 +- src/libtools/threads.c | 58 +++++++++++++++++++++++++++---------------- src/tools/bridge.c | 8 +++--- 15 files changed, 203 insertions(+), 105 deletions(-)

MamiyaOtaru commented 1 year ago

Noticed with wine 5.13 initially, tried with wine 8.01 no difference.

MamiyaOtaru commented 1 year ago

In addition: launching with a higher priority (nice -n -10) really tanks performance, which it did not do before.

launching like so hitches terribly sudo --preserve-env nice -n -10 su pi -c "WINEDLLOVERRIDES=ddraw=n,b WINEPREFIX=/home/pi/wine/.wine LD_LIBRARY_PATH=/home/pi/wine/lib/wine/:/home/pi/wine/lib/ /usr/local/bin/box86 /home/pi/wine/bin/wine ./StarCraft.exe"

Launching without nice like so it's not nearly as bad, but slower than it used to be.
WINEDLLOVERRIDES=ddraw=n,b WINEPREFIX=/home/pi/wine/.wine LD_LIBRARY_PATH=/home/pi/wine/lib/wine/:/home/pi/wine/lib/ /usr/local/bin/box86 /home/pi/wine/bin/wine ./StarCraft.exe

Without cnc-ddraw and nice -10 it is the same as before that commit without cnc-ddraw and nice -10 (no regression there, but just not as smooth as with cnc-ddraw before)

Without sudo nice and cnc-ddraw, it is better than with nice -10 and cnc-ddraw, but slower than before with cnc-ddraw (nice or no)

before commit https://github.com/ptitSeb/box86/commit/632a01dbe6e25cc89455df37ecb73cc2e2f2a4ea it ran fine and very smoothly with cnc-ddraw (with and without sudo nice), basically equivalent to the native arm client.

With current box86 head it is playable with cnc-ddraw but just not as smooth as before. In addition, nice -10 somehow exacerbates it, tanking performance badly. Without nice the performance loss isn't as stark, but since nice exaggerates it and make it easier to see it might be helpful for tracking down the performance loss. Which again seems specific to cnc-ddraw: wine's built-in ddraw seems to perform the same as before (which is almost there but not quite)

ptitSeb commented 1 year ago

Thanks for the analysis. I'll have a look, it shouldn't be slower.

MamiyaOtaru commented 1 year ago

current version, same. Looking at it a bit closer the CPU is just being used a lot more. Prior to that commit there was one wine Starcraft.exe process using about 50% with the rest not using much at all. After, one wine Starcraft.exe process uses 70%+ with others using 20% and 10%. If I use nice -10 the top one is using 90% (seems to be getting priority) which leads to worse in-game performance than nice 0.

At any rate, CPU usage is quite a bit higher after https://github.com/ptitSeb/box86/commit/632a01dbe6e25cc89455df37ecb73cc2e2f2a4ea when using cnc-ddraw.

Before (cnc-ddraw, nice -10): old

After (cnc-ddraw, nice 0): new

After (cnc-ddraw, nice -10: newLowNice

It would be nice to have cnc-ddraw working with Starcraft like it did before, as it was significantly faster than using Wine's ddraw (before or after commit).

Before (Wine ddraw, nice 0): builtinbefore

After (Wine ddraw, nice 0): builtinafter

MamiyaOtaru commented 1 year ago

Using the same cnc-ddraw with Age of Empires 2 shows a much smaller difference before that commit and after. With nice 0 there is not a discernible difference at all.

Before (AOE2, Nice 0, cnc-ddraw) aoe2before

After (AOE2, Nice 0, cnc-ddraw) aoe2after

With nice -10 there is more of a difference. The busiest wine process has about the same usage (65ish), but there is a secondary wine process that uses 2 before and 8-13 after.

Before (AOE2, Nice -10, cnc-ddraw) aoe2beforeLowNice

After (AOE2, Nice -10, cnc-ddraw) aoe2afterLowNice

Not nearly as big a difference as with Starcraft, and not noticeably detrimental in game with or without nice -10 (I like -10 as there are occasional hitches without it)

tl;dr Starcraft + cnc-ddraw is worse now with almost double CPU usage (though not as high as with native ddraw). AOE2 is the same as before. Nice -10 makes Starcraft even worse now (as in unplayable). It also makes AOE2 worse now, but not noticeably in game. Compared to before, something about Starcraft hits it hard, and running with a higher priority hits it a little

ptitSeb commented 1 year ago

I thought the custom mutex were optionnal in box86, but there are not. I'll make them optionnal (at compile time) on box86 (they are in box64), so you can build a version with regular mutex, so you can compare both version. It's still strange that there is a big speed impact, as thoses mutex are using for internal stuff that are not supposed to wait for long period of time.

ptitSeb commented 1 year ago

So, I have disable the use of custom mutexes for now @MamiyaOtaru so, you can update box86 and try again. I assume you should not see the large decrease in performances now.