narzoul / DDrawCompat

DirectDraw and Direct3D 1-7 compatibility, performance and visual enhancements for Windows Vista, 7, 8, 10 and 11
BSD Zero Clause License
907 stars 69 forks source link

Not working on Battle Realms on AMD Card #101

Closed nicosafull closed 3 years ago

nicosafull commented 3 years ago

Hello,

First of all, nice job.. I'm currently the only official maintainer of Battle Realms: Zen Edition. Battle Realms is a RTS that was officially launched in 2001, I made it possible to work on Windows 10 and mount Steam SDK on it to have multiplayer.. I've worked a lot on it but 3D stuff isn't my expertise area, putting your wrapper really boosts FPS a lot in any card but AMD ones, how can it be possible for that to happen? I can provide a .log from a user who plays with that card... He told me it had no benefit for him..

Despite the game run's smooth I've noticed some loose in graphic details as it was 16bit (it supported 16 and 32 bits), but only on some particular FX, not all of them

I'm really interested in including this as it solves many issues. I've only seen a glitch with steam overlay that misplaces textures.. but that's all, can it be bc log states "DWM8And16BitMitigation"?..

Would you mind helping me or somehow reach out to you? :). Thanks again for the great work

I put the log below:

00:05:20.726 Process path: C:\Program Files (x86)\Steam\steamapps\common\Battle Realms\Battle_Realms_F.exe 00:05:20.726 Environment variable __COMPAT_LAYER = "DWM8And16BitMitigation" 00:05:20.726 Loading DDrawCompat dynamically from C:\Program Files (x86)\Steam\steamapps\common\Battle Realms\ddraw.dll 00:05:20.759 DDrawCompat loaded successfully 00:05:20.760 Installing display mode hooks 00:05:20.769 Installing registry hooks 00:05:20.770 Installing Direct3D driver hooks 00:05:20.770 Installing Win32 hooks 00:05:20.783 Hooking user mode display driver: C:\WINDOWS\System32\DriverStore\FileRepository\38_hp_whiskylake_hws_iigd_dch.inf_amd64_23658979f54878c1\igdumdim32.dll+0xad80 00:05:20.817 Dynamic vertex buffers are not available 00:05:20.817 Dynamic index buffers are not available 00:05:20.819 Checking source color key support: passed 00:05:20.821 Installing DirectDraw hooks 00:05:20.821 Installing Direct3D hooks 00:05:20.828 Installing GDI hooks 00:05:20.901 Finished installing hooks 00:22:52.669 DDrawCompat detached successfully

narzoul commented 3 years ago

Is that log supposed to be from the AMD system? Because it clearly shows that Intel GPU drivers are loaded (igdumdim32.dll). I guess it's a hybrid GPU laptop.

nicosafull commented 3 years ago

Is that log supposed to be from the AMD system? Because it clearly shows that Intel GPU drivers are loaded (igdumdim32.dll). I guess it's a hybrid GPU laptop.

no, sorry that was from my setup: 8th gen Intel i7, this is from AMD user:

05:23:22.993 Process path: G:\SteamLibrary\steamapps\common\Battle Realms\Battle_Realms_F.exe 05:23:22.993 Environment variable __COMPAT_LAYER = "DWM8And16BitMitigation HighDpiAware" 05:23:22.993 Loading DDrawCompat dynamically from G:\SteamLibrary\steamapps\common\Battle Realms\ddraw.dll 05:23:23.011 DDrawCompat loaded successfully 05:23:23.011 Installing display mode hooks 05:23:23.017 Installing registry hooks 05:23:23.017 Installing Direct3D driver hooks 05:23:23.018 Installing Win32 hooks 05:23:23.030 Hooking user mode display driver: C:\Windows\System32\DriverStore\FileRepository\u0365275.inf_amd64_136741f59e43f995\B364966\aticfx32.dll+0x60520 05:23:23.071 Dynamic vertex buffers are available 05:23:23.072 Dynamic index buffers are available 05:23:23.075 Checking source color key support: failed (test result pattern is incorrect: 0xfa9f) 05:23:23.076 Incorrect z-buffer bit depth capabilities detected; changed from "16, 32" to "16, 24" 05:23:23.078 Installing DirectDraw hooks 05:23:23.079 Installing Direct3D hooks 05:23:23.086 Installing GDI hooks 05:23:23.153 Finished installing hooks 05:24:49.062 DDrawCompat detached successfully

narzoul commented 3 years ago

Well, there is nothing really unusual in those logs. He must have one of those newer generation AMD GPUs (RX 5000 series) because it's affected by a known color key driver bug. Does the game use Blt or BltFast with color keys? There's a compatibility fix in DDrawCompat for this issue, which basically forces those blits to be done by the CPU, which could have some performance overhead, especially on high resolutions.

I have an older generation AMD GPU (RX 480) and I didn't notice any issues, though I only have the GOG version of the game for testing. So it could also be due to some difference between the two game versions.

Anyway, please try to get debug logs of the issue using the ReleaseWithDebugLogs build, maybe that'll show something. Which parts of the game are affected? Menus, 3D scenes or both?

About the loss of graphic detail: if you mean loss of color resolution, in theory DDrawCompat shouldn't have any effect on that, unless it's some really odd bug in the code. Need more info on that one too. At least some screenshots showing the issue or a save game that can quickly reproduce it (if it's compatible with the GOG release) would be helpful.

nicosafull commented 3 years ago

Well, there is nothing really unusual in those logs. He must have one of those newer generation AMD GPUs (RX 5000 series) because it's affected by a known color key driver bug. Does the game use Blt or BltFast with color keys? There's a compatibility fix in DDrawCompat for this issue, which basically forces those blits to be done by the CPU, which could have some performance overhead, especially on high resolutions.

I have an older generation AMD GPU (RX 480) and I didn't notice any issues, though I only have the GOG version of the game for testing. So it could also be due to some difference between the two game versions.

Anyway, please try to get debug logs of the issue using the ReleaseWithDebugLogs build, maybe that'll show something. Which parts of the game are affected? Menus, 3D scenes or both?

About the loss of graphic detail: if you mean loss of color resolution, in theory DDrawCompat shouldn't have any effect on that, unless it's some really odd bug in the code. Need more info on that one too. At least some screenshots showing the issue or a save game that can quickly reproduce it (if it's compatible with the GOG release) would be helpful.

Yes, you are right he has 5700 XT, the particular issue is when there's fire in the game, but overall he has very bad performance, he solves it with dgvoodoo as many other players do though, I attach a savegame (file with .br1 extension) which must be unzipped in Saved Games folder and debug logs from him. That savegame is a stress test of a huge combat that with your dll (in my case) or dgvoodoo (on his case) game runs super smooth

Moreover, would you mind helping me on that regard, gifting you a copy and accessing source control? I'm really willing to find a way to compensate that effort you may input and sad that I don't really know how to address 3D issues :(. Or maybe you can join our official Discord server? discord.gg/FYjtaUh

Savegame debugging log: https://wetransfer.com/downloads/271ab06af2f8acbead815b1d0adc1b9f20210525185955/83278a

narzoul commented 3 years ago

I don't see anything in particular that's stalling for a long time in those logs. I thought it may be similar to another issue I was investigating a while ago, where copying surfaces between system and video memory was unusually slow on some old AMD drivers, but strangely enough only if the system memory surface was aligned to 32 bytes (which seems to be the optimal value on other systems). With well-aligned surfaces, it could stall up to 100-200 ms during a copy, but with intentionally misaligned surfaces, it was about as fast as on other systems.

Unfortunately, nothing like that seems to be going on here. Color keying isn't used either. So I have no idea why performance would be so bad on that GPU. It would require experimenting on that specific hardware to try to narrow it down. It could just be another driver issue, and maybe the only reason it's working better with dgVoodoo is because it's translating everything to Direct3D 11 or 12, which are implemented by newer driver APIs that may simply be better developed on those GPUs.

I don't think the source code would help me much with this either since I'm unable to reproduce the problem in the first place, so I have no idea if any change I'd make would have any effect on that particular system, even if it seems to have no effect on mine. But more importantly, I don't have the time to work on another project since I'm way behind even on this one as it is.

Actually, I seem to have pretty much the same performance with or without DDrawCompat in this game with the GOG release. It doesn't seem like the few Direct3D performance optimizations I've added are doing much at all for this game, at least on my system (with the RX 480).

A few things that I did notice from the logs that could improve performance with some source code changes:

  1. Use 32-bit display modes instead of 16-bit. According to the log, the display mode is still being set to 16-bit even though you mentioned the game should support 32-bit also. This will usually avoid the performance penalty of Windows having to emulate the 8/16-bit color modes since Windows 8. The emulation was especially slow in Windows 8, and although it improved a lot in Windows 10, it's probably still going to be slower than 32-bit, which is supported natively.
  2. Use exclusive full-screen mode. Ever since Vista (I think), full-screen DirectDraw-based applications are running in a fake (windowed) full-screen mode by default. This could also have some performance cost and it's relatively easy to re-enable exclusive full-screen mode. You can either add an application compatibility shim ("DXPrimaryEmulation" shim, with the parameter set to "-DisableMaxWindowedMode") if the installer can do that, or you can just call SetAppCompatData(12, 0) somewhere during initialization, before any DirectDraw functions are called. It's an undocumented function in ddraw.dll so you'd have to load it via GetProcAddress. You can use HRESULT WINAPI SetAppCompatData(DWORD, DWORD) as the function signature. This is basically what the shim and DDrawCompat do as well.
  3. Eliminate redundant state changes in the Direct3D code. Check the SetRenderState/SetTextureStageState calls in the logs. For example the following render states seem to be set to the same values over and over again probably unnecessarily:
    3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 145, 1) = 0
    3ab4 21:59:03.713 > IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 147, 1)
    3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 147, 1) = 0
    3ab4 21:59:03.713 > IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 146, 2)
    3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 146, 2) = 0
    3ab4 21:59:03.713 > IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 148, 0)
    3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 148, 0) = 0

    You can see which states those numbers correspond to even in the Direct3D 9 docs, in case you don't have the DirectX 7 SDK docs: https://docs.microsoft.com/en-us/windows/win32/direct3d9/d3drenderstatetype Similarly for SetTextureStageState, the texture stages between 2 to 7 are probably not even used, but some of their states still seem to be reset over and over to the same value.

Vertex buffer lock flag usage generally seems to be good and I didn't see any other surface locking or Blt calls that could stall, so I'm not sure what else could be done to improve things, apart from rewriting everything in a more modern API like DX11. But honestly I'm not a Direct3D developer either, what little knowledge I picked up comes from the DX7 SDK. Never wrote or used any 3D engines myself.

nicosafull commented 3 years ago

I don't see anything in particular that's stalling for a long time in those logs. I thought it may be similar to another issue I was investigating a while ago, where copying surfaces between system and video memory was unusually slow on some old AMD drivers, but strangely enough only if the system memory surface was aligned to 32 bytes (which seems to be the optimal value on other systems). With well-aligned surfaces, it could stall up to 100-200 ms during a copy, but with intentionally misaligned surfaces, it was about as fast as on other systems.

Unfortunately, nothing like that seems to be going on here. Color keying isn't used either. So I have no idea why performance would be so bad on that GPU. It would require experimenting on that specific hardware to try to narrow it down. It could just be another driver issue, and maybe the only reason it's working better with dgVoodoo is because it's translating everything to Direct3D 11 or 12, which are implemented by newer driver APIs that may simply be better developed on those GPUs.

I don't think the source code would help me much with this either since I'm unable to reproduce the problem in the first place, so I have no idea if any change I'd make would have any effect on that particular system, even if it seems to have no effect on mine. But more importantly, I don't have the time to work on another project since I'm way behind even on this one as it is.

Actually, I seem to have pretty much the same performance with or without DDrawCompat in this game with the GOG release. It doesn't seem like the few Direct3D performance optimizations I've added are doing much at all for this game, at least on my system (with the RX 480).

A few things that I did notice from the logs that could improve performance with some source code changes:

  1. Use 32-bit display modes instead of 16-bit. According to the log, the display mode is still being set to 16-bit even though you mentioned the game should support 32-bit also. This will usually avoid the performance penalty of Windows having to emulate the 8/16-bit color modes since Windows 8. The emulation was especially slow in Windows 8, and although it improved a lot in Windows 10, it's probably still going to be slower than 32-bit, which is supported natively.
  2. Use exclusive full-screen mode. Ever since Vista (I think), full-screen DirectDraw-based applications are running in a fake (windowed) full-screen mode by default. This could also have some performance cost and it's relatively easy to re-enable exclusive full-screen mode. You can either add an application compatibility shim ("DXPrimaryEmulation" shim, with the parameter set to "-DisableMaxWindowedMode") if the installer can do that, or you can just call SetAppCompatData(12, 0) somewhere during initialization, before any DirectDraw functions are called. It's an undocumented function in ddraw.dll so you'd have to load it via GetProcAddress. You can use HRESULT WINAPI SetAppCompatData(DWORD, DWORD) as the function signature. This is basically what the shim and DDrawCompat do as well.
  3. Eliminate redundant state changes in the Direct3D code. Check the SetRenderState/SetTextureStageState calls in the logs. For example the following render states seem to be set to the same values over and over again probably unnecessarily:
3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 145, 1) = 0
3ab4 21:59:03.713 > IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 147, 1)
3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 147, 1) = 0
3ab4 21:59:03.713 > IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 146, 2)
3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 146, 2) = 0
3ab4 21:59:03.713 > IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 148, 0)
3ab4 21:59:03.713 < IDirect3DDevice7Vtbl::SetRenderState(11A193C8, 148, 0) = 0

You can see which states those numbers correspond to even in the Direct3D 9 docs, in case you don't have the DirectX 7 SDK docs: https://docs.microsoft.com/en-us/windows/win32/direct3d9/d3drenderstatetype Similarly for SetTextureStageState, the texture stages between 2 to 7 are probably not even used, but some of their states still seem to be reset over and over to the same value.

Vertex buffer lock flag usage generally seems to be good and I didn't see any other surface locking or Blt calls that could stall, so I'm not sure what else could be done to improve things, apart from rewriting everything in a more modern API like DX11. But honestly I'm not a Direct3D developer either, what little knowledge I picked up comes from the DX7 SDK. Never wrote or used any 3D engines myself.

thank you very much for the super detailed response, you don't seem to have any improvement because AMD hardware is just bad on this game for a reason I'm trying to find out, on intel setup and GPUs with DirectDrawCompat the behaviour is just excellent than without it.. Same on Nvidia.. AMD is the real problem but I will take your considerations, I really appreciate the effort on both: this software and your response and further research, I really appreciate it, keep up the good work :)

narzoul commented 3 years ago

Sorry, I forgot to mention this important detail, but by "same performance with or without DDrawCompat" I meant that I get solid 60+ FPS even with many units on screen. Granted, the GOG version only goes up to 1024x768 resolution. Maybe on higher res more units fit on screen and that can drag FPS down further. Is performance worse than 60 FPS on RX 5700 XT at 1024x768?

nicosafull commented 3 years ago

Sorry, I forgot to mention this important detail, but by "same performance with or without DDrawCompat" I meant that I get solid 60+ FPS even with many units on screen. Granted, the GOG version only goes up to 1024x768 resolution. Maybe on higher res more units fit on screen and that can drag FPS down further. Is performance worse than 60 FPS on RX 5700 XT at 1024x768?

yes, steam version allows 16:9 resolutions that's why I wanted to gift it to you :), as I've increased Field of view more rendering takes place so undoubtedly performance tend to drop. The redundant calls may be bc game has a dynamic texture setting system that if FPS goes below some number it auto-adjusts quality to fit current FPS. Despite I understand your statements and ofc I really appreciate those I can't seem to find how the game forces 16 bits over 32, I will have to remove 16bpp sections and "ifs" I guess and check how the games handles it. Also I can relate with the "false fullscreen" setup as I see game starts windowed and then goes fullscreen but it's difficult for me to follow 3D stuff code