narzoul / DDrawCompat

DirectDraw and Direct3D 1-7 compatibility, performance and visual enhancements for Windows Vista, 7, 8, 10 and 11
BSD Zero Clause License
878 stars 67 forks source link

Glover with accelerated video cards #295

Closed ghotik closed 2 months ago

ghotik commented 2 months ago

Hi there, I wondered if you may be interested in an old glory that only recently I made working in sw emulation mode. The game is "Glover" and here https://sourceforge.net/p/dxwnd/discussion/general/thread/facff30400/#1bc0 you can find some discussion and hints about how to set a possible testbed. The thing starts becoming interesting when trying to configure the game not for sw emulation but for one of the many accelerated video cards that are listed in the configure program. Looking at the logs, you can see that no matter what you chose, the game is using Direct3D2 to render the textures, but there are terrific z-buffering problems because (I suppose) the scene textures are overlapped by the background textures and you see a huge background where sometimes only some scene element appears, like emerging from a flood. The problem looks like other z-buffering problems that we already met (and that you fixed with DDrawCompat), but the difference here is that we can't find any trick that works, and even DDrawCompat is no help because it shows only a black screen. Doesn't this trigger your curiosity? Of course, if you somehow can fix the problem I will be interested to replicate it in DxWnd.

ghotik commented 2 months ago

I add that I just now read the post in https://github.com/narzoul/DDrawCompat/issues/293 and tried the patched ddraw.dll.
Your last fix fixed the Glover black screen too. So now I can run the game with DxWnd + DDrawCompat and I can confirm that the result is pretty much the same: setting 3D accelerated cards in the game configuration shows this effect where the scene textures are either absent or hidden behind the curtain of some backgrount texture (see screenshot) ddrawcompat p.s. running the game in wireframe mode (so that all the texture borders should be visible) it seems that the missing parts of the scene are not back in the Z-ordering, but they are just missing.

ghotik commented 2 months ago

Some news: by chance we found that WineD3D ddraw replacement fixes the problem if used with DxWnd + fix texture pixel format. IMO I believe that the Direct3D2 is used in a non canonical way that WineD3D tolerates and fixes, while the system ddraw doesn't like, at least for most of the textures. It would be interesting to find what is the non-canonical detail because I suppose it could be easy to fix it in both DDrawCompat and DxWnd with no need to use necessarily WineD3D. More details here: https://sourceforge.net/p/dxwnd/discussion/general/thread/facff30400/?page=1#6c15

narzoul commented 2 months ago

I did not see a solution or even a description of the first obvious problem. As soon as the intro is skipped and the game starts rendering, it looks as if it's only rendering every second frame, and the others are just a solid light blue background. The issue happens with native ddraw too.

Then, after some more testing, the issue magically disappeared, and everything seemed to be rendered correctly with DDrawCompat v0.5.1. But there was some heavy stuttering.

After some more testing, the flashing issue came back.

But, during all the test runs, I never saw any z-buffer issues. I just started a new game and everything seemed to be rendered correctly. Is it not supposed to be obvious at the start of a new game? Do I have to go somewhere?

I'm fairly certain none of these issues are related to DirectDraw, and each run is just at the mercy of some uninitialized stack variables or similar, just like it was the case with Panzer Commander (https://github.com/narzoul/DDrawCompat/issues/200).

If you can send a debug log with DDrawCompat alone (no DxWnd in the mix please), I could try to compare it with mine to see what could be the difference.

ghotik commented 2 months ago

I suppose you requested a game alog with a session run with DDrawCompat 0.5.1 configured for LogLevel=debug. In attach the big log file. The game had an heavy flickering, but this doesn't worry me. Instead, it has the scene screens (that is, all 3D scenes that are not menus or similar parts) where most of the scene is missing. As a reference, I terminated the game session killing the game while it was showing the red flaming background that I posted in the screenshot above. Again, WineD3D seems to fix the problem, but being it a proxy ddraw.dll too we can't stack WineD3D on top of DDrawCompat and compare the DDrawCompat logs with WineD3D. DDrawCompat-glover.zip

To answer your question

Is it not supposed to be obvious at the start of a new game? Do I have to go somewhere?

The game renders more or less correctly all menu or transition screens, it fails to render most of the solid objects in 3D scenes. A first troubled sequence is at the beginning after the company logo where there is a "Glover" and "Press any key" text. I see a white cloudy empty landscape, but there should be a castle and some lands on the scene. Anyway, an even better situation is after you press the "start" command that brings to the red screenshot mentioned above. All this happens when you set some video card model close enough to a standard D3D device, like "Intel i740". The "No 3D-card" selections instead activate a sw renderer that is slower and grainy, but accurate.

For a better reference, the picture here below shows how the red-flaming screen should be rendered (picture taken using DxWnd + WineD3D)

correct

narzoul commented 2 months ago

Thanks for the extra details! I don't have much time to check the logs deeply now, but from a quick glance, I see that it uses "depth blt" to clear the depth buffer (see the pfnDepthBlt driver calls in the log). It's been known to have issues on some drivers, so DDrawCompat replaces it with pfnClear (see D3dDdi::Resource::depthFill).

I vaguely recall seeing also some issues with this recently, where the driver ignored whatever depth value was used for clearing, and just used a fixed 1.0f instead. This is also one of those strange cases where the game wants to use a fill value of 0.0f. I can't remember which game and which driver had similar problems before (it definitely wasn't igdumdim32.dll though, because I haven't had that one for years). But sure enough, if I override the depth value to 1.0f, then I get very similar results that you've shown and explained.

I'll try to put together a shader to clear the depth buffer later, to see if it produces different results. In the meantime, if you have some time to experiment, you could try hooking the surface Blt call when it receives DDBLT_DEPTHFILL, and try to implement it via surface locking instead, and just memset the whole surface memory to 0.

ghotik commented 2 months ago

I did the experiment filling the following source code inside the Blt wrapper. The procedure takes place when asking DDBLT_DEPTHFILL to the ZBuffer surface (verified with the logs). Sadly, the result doesn't change, the screen looks as without this patch. In any case, maybe the trick may become handy in some different occasion, you never know, so it wasn't a waste of time.

    if (dwflags & DDBLT_DEPTHFILL){
        DDSURFACEDESC2 ddsd;
        memset(&ddsd, 0, sizeof(ddsd));
        ddsd.dwSize = (dxversion <4) ? sizeof(DDSURFACEDESC) : sizeof(DDSURFACEDESC2); 
        res=(*pLockMethod(dxversion))(lpdds, NULL, (LPDDSURFACEDESC)&ddsd, DDLOCK_WRITEONLY, NULL);
        if(res) OutTrace("%s: Lock ERROR res=%#x(%s)\n", ApiRef, res, ExplainDDError(res));
        DWORD zsize = ddsd.lPitch * ddsd.dwHeight;
        OutTrace("!!! zsize=%d\n", zsize);
        memset(ddsd.lpSurface, 0, zsize);
        res=(*pUnlockMethod(dxversion))(lpdds, NULL);
        if(res) OutTrace("%s: Lock ERROR res=%#x(%s)\n", ApiRef, res, ExplainDDError(res));
        return DD_OK;
    }

Looking at the game logs I got the impression that the game is rendering the scene and the background separately in two BeginScene/EndScene blocks. Is it possible that the problem could depend on a missing ZBuffer cleaning at the end on one of these blocks? I may try ...

narzoul commented 2 months ago

I haven't got much further with the depth buffer issue (not being able to reproduce it makes it much more difficult to debug it), but I noticed something odd in both of our logs: every time DDBLT_DEPTHFILL is used, the driver receives a pfnSetRenderTarget call to switch between two render targets. The first one is the original back buffer, but the second one is some invalid junk with a null handle and some random number as the SubResourceIndex:

3334 11:46:07.160   > _D3DDDI_DEVICEFUNCS::pfnSetRenderTarget(&08B9A240, *{0,&08BCC8E8,0})
3334 11:46:07.160   < _D3DDDI_DEVICEFUNCS::pfnSetRenderTarget(&08B9A240, *{0,&08BCC8E8,0}) = 0
...
3334 11:46:07.212   > _D3DDDI_DEVICEFUNCS::pfnSetRenderTarget(&08B9A240, *{0,null,4294967289})
3334 11:46:07.212   < _D3DDDI_DEVICEFUNCS::pfnSetRenderTarget(&08B9A240, *{0,null,4294967289}) = 0

I suppose the switch itself makes sense, because it happens directly after Flip, and the runtime is then supposed to swap the primary surface with the back buffer, so an implicit render target switch is indeed needed at the driver level. It turns out that the junk value however is a consequence of the game creating the primary surface chain in a peculiar way: it creates a separate front and back buffer, and manually attaches them together with AddAttachedSurface. The only problem with this is that it doesn't use the DDSCAPS_3DDEVICE cap on the front buffer, so whenever the Flip would switch to that surface, some junk is passed to the driver instead of a valid render target. This is what causes the epilepsy-inducing flashing, with only every second frame getting rendered. I don't know if this scenario was ever supposed to work without the proper caps, or if it's just that the pre-WDDM runtime/drivers were more forgiving about this omission.

With a quick hack, the above issue is at least pretty easy to solve, so here's a test version with the added DDSCAPS_3DDEVICE cap on the front buffer: ddraw.zip (diff.txt compared to v0.5.1) The black screen patch is also included (even though I don't have that issue myself).

I suppose it would be too much to hope for this to solve your other rendering issues as well, so I also prepared another experiment. Building on the above patch, I also bypassed the attachment of the z-buffer to the backbuffer: ddraw.zip (diff.txt compared to v0.5.1)

Of course, this won't render correctly, but at least the castle should be partially visible after the logo video, even if it's rendered behind the landscape. If it's not visible at all, then it would indicate that the issue is not with the depth buffer.

Finally, one more difference I noticed in our logs is that yours has the alpha blending state permanently disabled at the beginning, while mine is switching it between on/off frequently. I guess we are using different "graphics cards" in the game's config? Mine is currently set to the Intel i740.

ghotik commented 2 months ago

I made some quick tests (I will test better later) and the preliminary results are the following:

both patched DDrawCompat v0.5.1 releases fixed the epilepsy-inducing flashing none of the patched DDrawCompat releases changed the rendering output (no castle in sight!) and I was using the Intel i740, so this doesn't explain the differences in the blending state

One thing that puzzles me is the fact that the missing scene textures are not behind other textures in the Z-Order, they seem just absent. This becomes rather evident enabling the DxWnd wireframe mode: in the attached pictures you can see one menu and after the scene where the castle should be, but the only texture borders visible are the two triangles of the background and some small textures for the text. wired1 wired2

narzoul commented 2 months ago

I'm running out of ideas for this one, but I think we should circle back to one of you initial comments a bit:

I add that I just now read the post in https://github.com/narzoul/DDrawCompat/issues/293 and tried the patched ddraw.dll. Your last fix fixed the Glover black screen too.

It's odd that you need that patch and I don't. The issue with Solaris was that it runs into some floating point stack overflow/underflow, which later causes issues with the std::ceil function returning NaN for perfectly valid input. I now also managed to track down the source of that issue, and if I'm reading the debugger correctly, it was an FP stack underflow in dsound.dll. This shows up also if you check the result of _statusfp(), which will have the SW_INVALID (0x10) bit set, right before the ceil operations start failing.

The thing is, I checked the _statusfp() of all threads in this game that run the presentation logic and the same ceil function, and I never have the SW_INVALID bit set there. If the above patch fixes some issue for you, then I assume your case is different. You should try to monitor the return value of _statusfp() periodically. You can also enable an exception to be thrown when the flag is set, by calling _controlfp(0, EM_INVALID). Ideally, this should be done in every thread, but that might be a bit difficult, and most likely the problem is with the main thread anyway. Then you can catch the instruction that set the flag and maybe find some way to work around it.

You can also reset the FPU with an asm instruction like fninit. If I do this at the beginning of Resource::presentationBlt in DDrawCompat, then it also fixes the problem with Solaris, without using the patch in the other issue.

This is just a theory at this point, but it's possible that the Intel driver is more sensitive to FP issues than others, and this is what causes the missing geometry.

elishacloud commented 2 months ago

You can also reset the FPU with an asm instruction like fninit. If I do this at the beginning of Resource::presentationBlt in DDrawCompat, then it also fixes the problem with Solaris, without using the patch in the other issue.

Wow! Adding the asm instruction fninit at the beginning of Present() fixed a long standing issue I have had for a long time in dxwrapper with the Age of Wonders games. I am not sure what the issue was but this makes the game run flawless, whereas before it would crash.

BEENNath58 commented 2 months ago

I haven't got much further with the depth buffer issue (not being able to reproduce it makes it much more difficult to debug it)

If we are talking about the missing geometry issue, that should likely happen in the old Intel drivers (one using the old igdumdim.dll or whatever it was called). The problem is reproducible on my Ivy Bridge Intel drivers on both WinXP and Win7

This is what causes the epilepsy-inducing flashing, with only every second frame getting rendered. I don't know if this scenario was ever supposed to work without the proper caps, or if it's just that the pre-WDDM runtime/drivers were more forgiving about this omission.

Happens on WinXP so it shouldn't be WDDM related. Atleast the latest fixes have fixed the problem.

I suppose it would be too much to hope for this to solve your other rendering issues as well, so I also prepared another experiment. Building on the above patch, I also bypassed the attachment of the z-buffer to the backbuffer:

This makes one set of geometry behind the other. Unfortunately I can't take screenshots because PS takes me out of the game, and Win+PS doesn't paste anything?

More importantly, why, with DDrawCompat, the game uses upto 58% of my 6-core Ryzen???

ghotik commented 2 months ago

I hope I interpreted correctly the suggestions, so I added the following control code in both the BeginScene and EndScene wrappers:

        DWORD fmask = _statusfp();
    if(fmask) {
        OutTrace("%s: statusfp=%#x\n", ApiRef, fmask);
        _clearfp();
    }
    DWORD imask = _status87();
    if(imask) {
        OutTrace("%s: status87=%#x\n", ApiRef, imask);
        _clear87();
    }

The result seems to confirm your idea, every BeginScene (as far as I could see browsing the log file) shows a situation like this:

...
IDirect3DDevice2::SetRenderState: d3dd=0xc5bd28 dwState=0x1c(FOGENABLE) dwValue=0()
IDirect3DDevice2::SetRenderState: d3dd=0xc5bd28 dwState=0x1(TEXTUREHANDLE) dwValue=0x1e0()
IDirect3DDevice2::EndScene: d3dd=0xc5bd28
IDirect3DDevice2::BeginScene: d3dd=0xc5bd28
IDirect3DDevice2::BeginScene: statusfp=0x80001
IDirect3DDevice2::SetRenderState: d3dd=0xc5bd28 dwState=0x1c(FOGENABLE) dwValue=0()
...

where the statusfp value changes between 0x80001 and 0x80003 with some rare occurrence of 0x1 or 0x3. I suppose that this means that while building the scene the program made _SW_INEXACT (0x1) or _SW_UNDERFLOW (0x2) errors giving a _SW_DENORMAL (0x80000) situation. Because of where the log line is written, that should mean that the previous instruction caused the floating point problem, that is something inside EndScene or in between Endscene and the next BeginScene. Sadly, if this is the cause I fear that the error is not inside ddraw/d3d calls where it could be fixed by my wrappers by changing the math libraries. Any suggestion?

narzoul commented 2 months ago

If we are talking about the missing geometry issue, that should likely happen in the old Intel drivers (one using the old igdumdim.dll or whatever it was called). The problem is reproducible on my Ivy Bridge Intel drivers on both WinXP and Win7

Yes, that's my problem. I can test 3 different GPU drivers (NVIDIA, AMD, and the new Intel drivers), but not that old Intel one. And none of the ones I can test have this problem.

This makes one set of geometry behind the other.

Yes, that's expected. I just wanted to make sure it's not some z-buffer test discarding the geometry completely in the bad case.

Unfortunately I can't take screenshots because PS takes me out of the game, and Win+PS doesn't paste anything?

I don't know about that, print screen works fine for me.

More importantly, why, with DDrawCompat, the game uses upto 58% of my 6-core Ryzen???

58% of what? A single core or all 6 cores? Measured with what? Unless you changed the default settings, DDrawCompat still restricts games to a single logical CPU core at a time, but the CpuAffinityRotation setting (also enabled by default) rotates that between available physical cores, so maybe that throws off whatever measurement you're using.

I suppose that this means that while building the scene the program made _SW_INEXACT (0x1) or _SW_UNDERFLOW (0x2) errors giving a _SW_DENORMAL (0x80000) situation.

Those are ok, I think. I get them also and they don't cause any problems. I've only seen problems with _SW_INVALID, but if you don't have those, then I don't understand why the Solaris patch matters at all?

Anyway, I can only suggest some further shotgun debugging steps. In addition to the previous test version changes, I've disabled all texturing and texture transparency in the driver, and also added logging of the D3DTLVERTEX data passed to the primitive drawing functions. The top half of the screen will be fully black, along with all objects, but you should still be able to see the shape of the castle after the logo. If not, send me the debug logs again, maybe comparing the vertex data will reveal something.

ddraw.zip (diff.txt compared to v0.5.1)

ghotik commented 2 months ago

The new ddraw.dll seems to bring some sporadic trouble because at the first try the game crashed immediately, at the second it showed a black screen and at the third (after I set the LogLevel=Debug configuration) instead it worked, though very slow (but this I suppose it was due to the huge logs). In any case, I hope that the files I prepared for you were taken correctly and meaningful. The test conditions are as usual: Game run in native mode, no DxWnd hooking active using your last ddraw.dll fix from the post above with the attached DDrawCompat-glover.ini file The screen output, apart being awfully slow, was identical to the last tests, no castle in sight (and no top half of the screen fully black ?!?)

DDrawCompat-glover.zip

BEENNath58 commented 2 months ago

58% of what? A single core or all 6 cores? Measured with what? Unless you changed the default settings, DDrawCompat still restricts games to a single logical CPU core at a time

CPU usage as told by Task Manager. This isn't big deal, until the fact have to wait around a minute for Ctrl + Alt + Del to register, and the mouse is as laggy as running AIDA64 in realtime

The top half of the screen will be fully black, along with all objects, but you should still be able to see the shape of the castle after the logo.

It was painful, but you meant this? menu glovergame

ghotik commented 2 months ago

Hello again. Being puzzled by the difference between my results and your expectations and BEENNath58 results, I repeated the experiment won another computer, though again a HP portable probably similar enough to the one used in my previous experiment. Again, the experimental ddraw.dll showed this curious reaction to crash the game at the first try and work at the second and later attempts, but I suppose this is not important, I just report this in case you would like to know. On this experiment the video result was similar to the screens posted by BEENNath58 (sadly, the computer becomes too slow to try to take snapshots) with large portions of black areas and grayscale or black textures. I got again a large log file. Since the computer is very slow and the screen is mostly black it's not possible to conduct a very short and precise experiment, so I fear that the log reading would be more boring and painful... Then, since the rar compression is twice more efficient of the zip format accepted by this forum for attachments, I compressed the files twice, with WinRAR first and zip afterwards. I hope that these results could reveal some clue.

DDrawCompat-Glover-gho(2).zip

narzoul commented 2 months ago

The screen output, apart being awfully slow, was identical to the last tests, no castle in sight (and no top half of the screen fully black ?!?)

You must have used the wrong ddraw.dll there, because your logs also don't show the new "vertices:" logs I added.

Anyway, if you did see the castle on your other machine, then the issue is not going to be with the vertex data I think. Most likely it's really your textures that are fully transparent. The game does use texture color keying heavily, so I suppose it's possible that most of your textures have only a single color, matching the color key, or maybe color keying is somehow broken in the driver.

Anyway, I think you can disregard the latest test version then, and just use the earlier one. Could you repeat the testing, but with different ColorKeyMethod set? Especially the "none" setting, which should disable texture color keying. You can also change this in-game through the config overlay (shift+f11).

CPU usage as told by Task Manager. This isn't big deal, until the fact have to wait around a minute for Ctrl + Alt + Del to register, and the mouse is as laggy as running AIDA64 in realtime

Maybe the low level keyboard hook installed for the hot keys in DDrawCompat is not getting any CPU time because another critical thread is keeping the CPU busy. Try disabling the ConfigHotKey and StatsHotKey in the settings, that should prevent the hook from being installed.

ghotik commented 2 months ago

You must have used the wrong ddraw.dll there, ...

It's possible, but since I'm pretty sure I used your last attach, could you please post the correct ddraw.dll again (maybe renamed, I'll take care myself to rename it again to the required ddraw.dll filename). Is there anything in the log file that tells which DDrawCompat release generated it?

Anyway, if you did see the castle on your other machine ....

No, I never see the castle on both machines if I use the "Intel i740" configuration choice. I can see the correct scene only in two ways, either configuring in the game settings the "non 3D card" device or dropping in the game folder the WineD3D ddraw.dll implementation.

BEENNath58 commented 2 months ago

Try disabling the ConfigHotKey and StatsHotKey in the settings, that should prevent the hook from being installed.

That worked!

It's possible, but since I'm pretty sure I used your last attach

I see THESE in his logs:

36ac 09:22:19.766 vertices: [{239.5,-0.5,0,1,0,0,0,0},{30,0,0,0,1154609152,3204448256,0,1},{1,0,1,0,1129447424,0,0,0},{239.5,1079.5,0,1,0,1065353216,0,1}]

Isn't that what you meant?

Anyway, if you did see the castle on your other machine,

He meant to say he saw the black castle with the new ddraw.dll. However the actual castle with proper coloring is still missing behind the red BG.

narzoul commented 2 months ago

Never mind all that, I think I finally found the issue. After checking the vertex data in gho's second set of logs and not finding any meaningful difference there either, I checked the last suspicious thing I could think of: alpha testing.

The game has the ALPHATESTENABLE render state always enabled, but alternates between two different ALPHAFUNCs, D3DCMP_ALWAYS and D3DCMP_GREATER. But it uses a fixed bogus ALPHAREF value of 65488, which is not valid, since the maximum alpha is 255. So whenever the D3DCMP_GREATER comparison is used, I suppose the Intel driver interprets the ALPHAREF value literally, and discards every pixel, because they will never pass that impossible comparison. I verified this on my end by switching D3DCMP_GREATER with D3DCMP_NEVER, and now I get the same rendering artifacts as gho.

Here is the fix, which simply truncates the ALPHAREF value to the low-order byte (I hope this is how the other drivers interpret it): ddraw.zip (diff.txt compared to v0.5.1)

If it works, then it's easy enough to integrate into DxWnd, even without driver hooks, by hooking SetRenderState in IDirect3DDeviceX instead.

ghotik commented 2 months ago

Super! The fix allows running the game with DDrawCompat alone or in combination with DxWnd (see screenshot). Now that the problem is clear, I think it will be easy to add the alpha value trimming also in DxWnd and run the game with DxWnd alone. Thank you so much!

ddc-fix4b

ddc-fix4

BEENNath58 commented 2 months ago

Tried it on my Windows 7, with DDrawCompat and it works very well. Problem seems fixed