narzoul / DDrawCompat

DirectDraw and Direct3D 1-7 compatibility, performance and visual enhancements for Windows Vista, 7, 8, 10 and 11
BSD Zero Clause License
878 stars 67 forks source link

Wild Wild West: The Steel Assassin configuration window turns black #306

Open automatedbugreportingfacility opened 2 months ago

automatedbugreportingfacility commented 2 months ago

First of all, this is truly an impressive piece of software with great compatibility, I was surprised to find out that an obscure game from 1999 works almost perfectly! Here's one thing I've noticed so far:

When Wild Wild West: The Steel Assassin is launched, a configuration box is displayed. It looks correctly for a short while: 1

Then, Gdi::installHooks() is called and the window turns to this: 2

Once I move the window off the screen and drag it back, it's redrawn correctly: 3

This bug occurs in the full version of the game, I couldn't reproduce it in the demo versions. GPU: GeForce GTX 970 (driver: 552.22) OS: Windows 10 22H2 Log: DDrawCompat-VRDMfc.zip

If the game is run from a non-administrative account, it is necessary to enable "Run this program as an administator" for C:\Program Files (x86)\Southpeak Interactive\west\launch32.exe to fix a DLLRegisterServer error. The binary loading ddraw.dll is \west\bin\VRDMfc.exe.

Please let me know if I should provide any more information.

automatedbugreportingfacility commented 2 months ago

Then, Gdi::installHooks() is called

More specifically, the window turns black after the Dll::createThread() call in Gdi::GuiThread::installHooks().

If a debugger is being used, it might be necessary to ensure that the window of the debugger does not overlap with the game window. Presumably, that may cause the game window to redraw, and the glitch disappears.

narzoul commented 1 month ago

Thanks, this should fix it: ddraw.zip (diff.txt compared to v0.5.2)

automatedbugreportingfacility commented 1 month ago

Thanks for the fix, it seems to work well. I've tested all the various wrappers out there, and yours is currently the only one capable of handling this game -- great job!

Btw, I was looking into the bug a bit, and noticed that it was a race condition -- if the window thread handled WM_WINDOWPOSCHANGED before WM_SYNCPAINT, then things went wrong. If WM_SYNCPAINT came in first, the window was drawn correctly, not sure why. I'll try to figure it out someday, but I'm a complete noob in this graphics stuff.

automatedbugreportingfacility commented 1 month ago

After overcoming some WinDbg idiosyncrasies, I managed to debug the thing, and I noticed that there might be an issue with handling of Window::invalidatedRegion. Here's what happens:

The Gui thread starts the InitTopLevelWindow() call, which is guarded by the critical section lock. In the meantime, the window thread receives a WM_WINDOWPOSCHANGED message and starts waiting to acquire the lock. The Gui thread creates the Window object for the top-level window, initializes Window::invalidatedRegion to a NULLREGION in the constructor, and correctly updates it in updatePosition() to be a SIMPLEREGION whose size corresponds to the entire window rect. The Gui thread then sends a WM_SYNCPAINT message to the window thread -- however, at this point, the window thread is already busy waiting for the lock in WM_WINDOWPOSCHANGED. Then, initTopLevelWindow() completes, the critical section lock held by the Gui thread is released, and the window thread starts its own round of updateAll(). Now, an incorrect (I believe) thing happens: when it encounters the top-level window, it nukes its invalidatedRegion in updatePosition(). Then, the WM_WINDOWPOSCHANGED handler completes, and RedrawWindow() is called in response to the WM_SYNCPAINT message sent from the Gui thread, but there's nothing left to redraw anymore.

While the fix should ensure that the init shouldn't glitch anymore, do you think it's possible that a similar race may occur with multiple window threads? I was also wondering if there was any way to validate that regions have been handled in RedrawWindow() before throwing them away.

narzoul commented 1 month ago

I think you are right. This might be a better fix: ddraw.zip (diff.txt compared to v0.5.2)

I was also wondering if there was any way to validate that regions have been handled in RedrawWindow() before throwing them away.

I'm not sure I understand what you mean by this. There is no mention in the docs that the region must be kept because it might still be accessed after RedrawWindow returns. Then when would it be safe to release it? Pretty sure the function is already done with it by the time it returns.

automatedbugreportingfacility commented 1 month ago

Thanks, the fix looks robust. The only thing I noticed is that the handler for WM_SYNCPAINT doesn't acquire any lock, so it may redraw a window at the same time as another thread is updating windows or also redrawing. Just a thing I noticed -- I have no idea if that poses any correctness risks.

I'm not sure I understand what you mean by this. There is no mention in the docs that the region must be kept because it might still be accessed after RedrawWindow returns. Then when would it be safe to release it? Pretty sure the function is already done with it by the time it returns.

I meant a way to verify that a RedrawWindow() call had been made for a particular invalidated region before discarding it in updatePosition(). But that's not necessary with the changes you've just made.

Btw, I accidentally found another bug affecting this game's configuration window -- do you want me to post here, open another ticket, or wait for v0.5.3 before reporting?

narzoul commented 1 month ago

The only thing I noticed is that the handler for WM_SYNCPAINT doesn't acquire any lock, so it may redraw a window at the same time as another thread is updating windows or also redrawing.

That's how it would normally work too, so I don't think extra synchronization is needed. It's the application's responsibility to synchronize in these cases.

Btw, I accidentally found another bug affecting this game's configuration window -- do you want me to post here, open another ticket, or wait for v0.5.3 before reporting?

Either is fine with me. There is no need to wait for v0.5.3.

automatedbugreportingfacility commented 1 month ago

Okay, so the process implements a lockup detector that can be triggered as follows:

1) Open launch32.exe; 2) Wait 5 seconds; 3) Open Notepad -- the game window should now be in the background; 4) Type something in -- so that attempting to exit Notepad would not close it immediately; 5) Press ALT + F4 -- it's important that this combination is pressed. A confirmation dialog should appear; 6) Wait 2 seconds -- a game dialog with the text "Lockup detected. Do you want to abort this game?" should appear. If it doesn't, dismiss the confirmation dialog and repeat steps 5 & 6.

When the lockup dialog appears, a black rectangle is displayed in the top-left corner. This rectangle is not present when running the game without the wrapper in Windows 10 22H2 or Windows 98 SE.

You can find the lockup handler code in svpdrv.dll+0x138e5:

      DirectDrawFlipToGDI();
      ShowCursor(1);
      pHStack_2f4 = (HWND)0x0;
      tStack_2f0.x = 0;
      tStack_2f0.y = 0;
      GetClientRect(DAT_2701db40,&tStack_2dc);
      ClientToScreen(DAT_2701db40,&tStack_2f0);
      AllocBlankWin(&pHStack_2f4,s_Temp_check_for_close_2701ca04,0x90000000,tStack_2f0.x,
                    tStack_2f0.y,tStack_2dc.right,tStack_2dc.bottom,*DAT_2701ccf8);
      iVar5 = MessageBoxA(pHStack_2f4,s_Lockup_detected._Do_you_want_to_a_2701c9d0,&DAT_2701d644,
                          0x50004);
narzoul commented 1 month ago

I can't repro the issue here on Windows 11, even with the classic version of Notepad (Windows 11 uses a newer Windows Store version by default). After the confirmation dialog appears in Notepad, nothing happens with the game launcher. Tried it without DDrawCompat too.

automatedbugreportingfacility commented 1 month ago

I've actually been looking into it today, and AllocBlankWin() does this:

tagWNDCLASSA local_2c;
local_2c.style = CS_BYTEALIGNCLIENT;
local_2c.lpfnWndProc = (WNDPROC)&LAB_27001000;
local_2c.cbClsExtra = 0;
local_2c.cbWndExtra = 4;
local_2c.hInstance = svpdrv.dll;
local_2c.hIcon = (HICON)0x0;
local_2c.hCursor = (HCURSOR)0x0;
local_2c.hbrBackground = (HBRUSH)0x0;
local_2c.lpszMenuName = (LPCSTR)0x0;
local_2c.lpszClassName = "VrdBlankClass";
RegisterClassA(&local_2c);
hWnd = CreateWindowExA(0,  // dwExStyle
  "VrdBlankClass",     // lpClassName
  "Temp check for close",  // lpWindowName
  WS_VISIBLE | WS_POPUP,   // dwStyle
  0,               // X
  0,               // Y
  800,             // nWidth
  600,             // nHeight
  (HWND)0x0,           // hWndParent
  (HMENU)0x0,          // hMenu
  svpdrv.dll,          // hInstance
  (LPVOID)0x0);        // lpParam
SetWindowTextA(hWnd, "Temp check for close");
SetWindowPos(hWnd,
  HWND_TOPMOST,        // hWndInsertAfter
  0,               // X
  0,               // Y
  0,               // cx
  0,               // cy
  SWP_NOCOPYBITS | SWP_NOREDRAW | SWP_NOMOVE | SWP_NOSIZE);  // uFlags

I believe it's important that dwStyle specifies WS_POPUP -- the window is visible without it.

When DDrawCompat is used, the black window is drawn in updatePresentationWindowPos() when SetWindowPos() is called on the invisible pop-up window's child.

I have no idea why the lockup handler doesn't trigger for you, it works reliably on Windows 10 22H2 here. I've extracted the game code and put it in a test case: main.zip -- please let me know if it works for you. I also managed to force drawing the black rectangle without DDrawCompat, but only if I inject a Sleep() call (not present in the game) before MessageBoxA() and either try clicking multiple times within the invisible window area or right-clicking the window's tab on the taskbar.

narzoul commented 1 month ago

Thanks, your code sample is enough. So the game creates a "visible" window but it doesn't actually paint anything on it, so it appears completely transparent under normal circumstances. The black box with DDrawCompat is visible because it creates a presentation window above all visible windows and redirects presentation to it. This is done to avoid flickering between graphics content that DDrawCompat can hook, and those that it can't (e.g. GDI calls made from kernel mode).

I'm not sure why the window is set to visible in this case. This would cause other issues, like not being able to click through it, and windows behind it not repainting (at least without DWM). Also, if you drag other windows over it (again without DWM, like in Win98), those other windows should smear their image remnants all over it. At least that's what I would expect. This is at least mostly correctly emulated by DDrawCompat.

Unfortunately it is not trivial to track which pixels on such a window have been actually painted before and which ones should still be transparent. In some cases it's even impossible, e.g. if the painting is done through DirectDraw's Lock method on the primary surface. I don't think there is much of a real use case for this anyway, so I'm not planning to improve this situation for now.

automatedbugreportingfacility commented 1 month ago

Thanks for looking into it. I tested on Windows 98 SE, and the smearing effect is there indeed. I also realized why the reproduction steps didn't work for you -- I missed step 0, which is setting the DWORD Logs in HKEY_LOCAL_MACHINE\SOFTWARE\WOW6432Node\Southpeak Interactive to 1. I did that by setting "Alerts Enabled" from WESTGAME_CD/west/Setup/redist/DXTest.exe, and then forgot about that. Sorry!

In the process of testing a performance issue, I accidentally found a crash in DDrawCompat. It can be reproduced by opening launch32.exe and quickly pressing the "Hardware Configuration" button. It's intermittent, so it might be necessary to repeat those steps a few times. The stack trace is as follows:

[0x0]   0x0!+   0x19f8c0   0x5d9acff2   
[0x1]   DDRAW!Compat::queryInterface+0x17   0x19f8c4   0x2150330d   
[0x2]   DDRAW!Compat::queryInterface+0x1f   0x19f8c4   0x2150330d   
[0x3]   DDRAW!CompatPtr<IDirectDraw7>::from+0x1f   0x19f8c4   0x2150330d   
[0x4]   DDRAW!?A0x8905f59c::onDirectDrawCreate+0x27   0x19f8c4   0x2150330d   
[0x5]   DDRAW!`anonymous namespace'::directDrawFunc<0,long (__stdcall*)(_GUID *,IDirectDraw * *,IUnknown *),_GUID *,IDirectDraw * *,IUnknown *>+0xc2   0x19f8c4   0x2150330d   
[0x6]   xvideoio!DirectDrawInit+0xbd   0x19f970   0x27014182   
[0x7]   svpdrv!VRDDirectDrawInit+0x35   0x19f988   0x29024ddd   
[0x8]   svprun!LoadProduction+0x33d   0x19f9e8   0x29024bc6   
[0x9]   svprun!LoadProduction+0x126   0x19fb1c   0x507f32   
[0xa]   vrdmfc + 0x7f32!vrdmfc+0x7f32   0x19fb48   0x77580a46   
[0xb]   ntdll!RtlpValidateHeap+0x21   0x19fbc0   0x0   

DDRAW!CompatVtable<IDirectDrawVtbl>::s_origVtable is 0x0, and so is EIP. The root cause is pretty clear after inspecting the stack trace of another thread:

[0x0]   ntdll!NtCreateFile+0xc   0x26fdc10   0x75bb04cb   
[0x1]   KERNELBASE!CreateFileInternal+0x4eb   0x26fdc14   0x75baffce   
[0x2]   KERNELBASE!CreateFileW+0x5e   0x26fdcd8   0x75bacfa1   
[0x3]   KERNELBASE!CreateFileA+0x31   0x26fdd0c   0x5d76bd68   
[0x4]   ddraw_5d760000!HackMeBaby+0x468   0x26fdd3c   0x5d768b89  // Huh?
[0x5]   ddraw_5d760000!InternalDirectDrawCreate+0x889   0x26fe0e0   0x5d78e0c9   
[0x6]   ddraw_5d760000!DirectDrawCreateEx+0x79   0x26fe7d0   0x5d9ae175   
[0x7]   DDRAW!`anonymous namespace'::installHooks+0x235   0x26fe7f8   0x5d9ad0c1   
[0x8]   DDRAW!`anonymous namespace'::directDrawFunc<16,long (__stdcall*)(int (__stdcall*)(_GUID *,char *,char *,void *,HMONITOR__ *),void *,unsigned long),int (__stdcall*)(_GUID *,char *,char *,void *,HMONITOR__ *),void *,unsigned long>+0x71   0x26fe844   0x502506   
[0x9]   vrdmfc + 0x2506!vrdmfc+0x2506   0x26fe8d0   0x508ab9   
[0xa]   vrdmfc + 0x8ab9!vrdmfc+0x8ab9   0x26fe8e8   0x0   

directDrawFunc() shouldn't proceed until the first thread to hit installHooks() completes the job.

automatedbugreportingfacility commented 1 month ago

It can be reproduced by opening launch32.exe and quickly pressing the "Hardware Configuration" button. It's intermittent, so it might be necessary to repeat those steps a few times.

One more important detail here: CpuAffinity must be set to all. The game does some shenanigans with winmm::timeGetTime(), and CpuAffinity=all is required to fix the frame rate when videos are being played. Apparently, using more cores also makes the race condition described above more likely to occur.

narzoul commented 1 month ago

Thanks, some synchronization is indeed missing there. This should fix that: ddraw.zip (diff.txt compared to v0.5.2) The previous patch is also included.

One more important detail here: CpuAffinity must be set to all. The game does some shenanigans with winmm::timeGetTime(), and CpuAffinity=all is required to fix the frame rate when videos are being played.

I can't tell any difference in the video playback rate. According to DDrawCompat's stats overlay, the videos play at a fixed 15 flips per second either way. Maybe you are referring to the blit rate? But that's probably just the cursor's refresh rate, so it doesn't seem that important during videos.

automatedbugreportingfacility commented 1 month ago

The fix works great, no more crashes, thanks!

As for performance, 2 aspects seem to have an impact on this game: 1) There are 3 page handling settings available in "Hardware Configuration": "No page flipping", "Double buffer" (this is the default one), and "Triple buffer". When the game is run with "No page flipping", animation is a buttery smooth 60 FPS (or whatever the video/animation frame rate is). However, if "Double buffer" or "Triple buffer" is selected, animation becomes more unstable and stuttery. As a test case, select double buffering and "Force software 3D renderer" from hardware configuration, "New game", press Esc to skip the intro and enter the shooting range, and try running left and right. If the game is run without DDrawCompat (DWM8And16BitMitigation required, will glitch but should work), animation of the running character is smooth, not so much with DDrawCompat. Might just be some stupid thing the game does, but if you give me some guidance (i.e. what to use for profiling and what to look for), I could try to investigate. 2) Even with "No page flipping" selected, the performance of cursor animation will degrade without CpuAffinity=all, it's not just very choppy during videos (although it's very prominent there), but also more stuttery during the game (which also plays some animations here and there). This is reflected in the lower blit rate, as you noticed. Incidentally, I played 2 games with DDrawCompat and both work better with CpuAffinity=all (the other being Hooligans: Storm Over Europe, where it fixes a synchronization problem with video subtitles), but this sample size is obviously not big enough to draw any meaningful conclusions :)