narzoul / DDrawCompat

DirectDraw and Direct3D 1-7 compatibility, performance and visual enhancements for Windows Vista, 7, 8, 10 and 11
BSD Zero Clause License
880 stars 67 forks source link

Empire Earth feedback #251

Open EnergyCube opened 7 months ago

EnergyCube commented 7 months ago

Hello and thank you very much for all your hard work on this project!

I tested it on "Empire Earth" which uses DirectDraw / DirectX 7 and suffers from a lot of compatibility problems.

I was amazed at the performance and graphics correction provided by the DLL, it's really just incredible. However, there are a few problems with the GDI part.

Here's what the game is using:

However, the multiplayer lobby suffers from numerous artefacts on opening and during use (notably by leaving a trail behind the mouse or with parts of the screen not updating when there is a sudden change in screen content) or sometimes for some reason the window don't open and freeze the game window, that force to leave the game window and when we go back the multiplayer window is working. And another problem the multiplayer connection initialization overlay, there is a dead lock (https://github.com/narzoul/DDrawCompat/issues/100), I have the code associated with this connection screen and can possibly provide debugging efforts or part of the code if necessary, from observations I've made the problem is that WM_DESTROY never arrives in the multiplayer connection overlay WndProc, so I suspect a problem with the order of the windows in the wrapper that causes a bad transimition of events or something similar.

The "simplest" fix from an external point of view is to simply not use the GDI part of the wrapper, but the modularity of the wrapper doesn't seem to allow this. I tried to remove this line but it simply make the game stuck on a black screen. What do you recommend to do? Is it possible to add an option to don't use the GDI wrapping (even if this means not having the wrapper's settings overlay, even if from my understanding of the Dll it should still be possible)?

I really hope that something can be done because for the moment I can't provide the dll in the game community installer because it creates too many problems, whereas as far as gameplay is concerned the performance and fix are absolutely wonderful.

By the way, I've just tested version v0.5.0 pre-release and I can confirm that all the problems are still there, except perhaps the lobby artifact which seems to be really reduced and less frequent, but still present.

narzoul commented 7 months ago

GDI is a pretty loose term and there are many ties to it in some ways in DDrawCompat, it's not so easy to just disable it altogether. Some parts of it could be decoupled, such as the redirection of GDI draw calls to DirectDraw surfaces, but I'm not sure if this would fix all such problems. I was thinking of adding such a setting, but it's a lot of effort and I didn't have time for it yet. Maybe in a future release.

Do these issues still require NeoEE to be installed? I have bad memories of it last time I tried it. It was blocked by Windows Defender and I'm reluctant to overrule it on my main PC.

If NeoEE is not needed, just a plain install from GOG, then I might check it out. Otherwise, if it works in a VM with DDrawCompat, I might risk installing it there in an isolated enviroment.

EnergyCube commented 7 months ago

Thanks for your answer and time.

Do these issues still require NeoEE to be installed? I have bad memories of it last time I tried it. It was blocked by Windows Defender and I'm reluctant to overrule it on my main PC.

Only the multiplayer connection initialization overlay deadlock is related to NeoEE (https://github.com/narzoul/DDrawCompat/issues/100). Yes in the past Windows Defender was detecting NeoEE as a virus, I inherited the code from former community devs and fixed what was causing problems with all anti-virus programs. (It was a false positive related to an executable that was patching the game memory, I moved that patching in a dll loaded by the game so now it's perfectly encapsulated).

By the way, since early 2022 the NeoEE patch has been replaced by a new setup (script src) that performs a complete pre-patched installation, so I invite you to use https://empireearth.eu/download in case you need to install the game again as it is the most popular way to install the game currently. During the installation pick custom install and don't select any DirectX Wrapper since we will test with our own DDrawCompat.

A small information that you'll probably notice very quickly: the multiplayer lobby is actually another window that hides the game's main window. The original developers really added this on the fly... In case it could be interesting, you should know that the entire code of the multiplayer lobby ended up on Github for some unknown reason, it could possibly help (the most interesting part is probably in WONGUI).

How to reproduce

Multiplayer connection initialization overlay

Problem(s): Deadlock and no overlay visible Info: I can provide the code or test things myself if needed (the code is quite bad since it's from the formers community dev but since it work without wrapper it should also work with I guess). Btw the deadlock will make your mouse and desktop not usable so you better have an easy way to close the game process. Steps: Simply start a single player or multiplayer game with NeoEE

Multiplayer lobby

Problem(s): Sometimes the lobby window isn't visible or when it work sometimes some "square" are visible image Info: You may notice a freeze while connecting to NeoEE lobby (shouldn't be necessary, but if you want to connect you will notice it, it's normal, so don't waste time investigating that) Steps: Open the game window in loop until it happen (once you are in the lobby you can repeat the process very quicly by doing alt+f4 and space, it will close the lobby and re-open it because it was the previously used button in the game form)

narzoul commented 7 months ago

Here's a fix attempt: ddraw.zip (diff.txt compared to v0.5.1)

The deadlock was actually easy to fix. It was caused by the inputs of the different GUI threads getting attached together, similar to using the AttachThreadInput API, but caused instead by making the game's windows the owners of DDrawCompat's presentation windows, which are created on their own thread. I removed this owner relationship in the patch above. It's a one line change in Gdi/PresentationWindow.cpp. I think this should only break the taskbar thumbnail/live preview fix for windowed applications that was added in v0.5.0, but more extensive testing may be needed to make sure. (And then I need to find some other fix for the thumbnails...)

The next problem was that the NeoEE overlay was showing only as a black screen. This was due to missing support for WS_EX_LAYERED windows that are managed using UpdateLayeredWindows. It was quite a bit more effort to add in support for this, but it seems to work now in the end.

I actually spent most of the time debugging why the overlay wasn't semi-transparent with DDrawCompat, when it was so natively. This turned out to be an issue with the overlay itself, which seems to rely on the Windows 8+ behavior of CreateCompatibleBitmap, which always creates a 32-bit bitmap if the HDC parameter is a display DC. This is due to how emulation of 8/16-bit display modes is implemented (rather incorrectly). DDrawCompat fixes this and returns a bitmap matching the color depth of the emulated display mode in this case, but then the per-pixel alpha information is lost in the bitmap. (I'm also not sure why per-pixel alpha is used in this case, when I assume the overlay has constant alpha for all pixels anyway.)

I wanted to check if my above assumptions are correct by testing it on Windows 7, but I couldn't get the game working in a VM. Do you have such a test system perhaps, or information about whether the overlay is also non-transparent natively (without wrappers) on Windows 7?

You can get back the transparency by commenting out this line: https://github.com/narzoul/DDrawCompat/blob/v0.5.1/DDrawCompat/Gdi/DcFunctions.cpp#L298 But I'm not sure what would be the consequences of this for other games. If my above assumptions are correct, I'd rather leave this as it is, and let it be patched in NeoEE itself (e.g. use CreateBitmap instead, where the color depth can be explicitly specified).

I'm not sure about the "square" issue as I wasn't able to reproduce it even before these fixes, but I hope this got fixed by the deadlock fix also.

I also couldn't find any of those "numerous artefacts" you mentioned in your first post, but maybe I'm looking in the wrong places. How do I reproduce those exactly? Or were those already fixed in v0.5.0?

EnergyCube commented 7 months ago

Thanks for your time on this issue, here is some feedback.

Overlay deadlock

I confirm there is no more deadlocks that's perfect!

Overlay transparency

The overlay is now indeed visible and transparent as it should! It seems to work even better than other wrapper that often have issue with the window priority.

Here is some code about how the overlay is created in case it might be interesting:

ctx->screen = GetDC(NULL);
ctx->dc = CreateCompatibleDC(ctx->screen);
HBITMAP hBitmap = CreateCompatibleBitmap(ctx->screen, ctx->width, ctx->height);
SelectObject(ctx->dc, hBitmap);
ctx->hGraphics = Graphics::FromHDC(ctx->dc);
ctx->hGraphics->SetSmoothingMode(SmoothingModeAntiAlias);
tx->hGraphics->SetInterpolationMode(InterpolationModeHighQualityBicubic);

...

hwnd = CreateWindowEx(
    WS_EX_LAYERED | WS_EX_TOOLWINDOW | WS_EX_NOACTIVATE | WS_EX_TOPMOST,
    g_szClassName,
    L"NeoEE RIP Hosting Overlay",
    WS_POPUP | WS_DISABLED,
    0, 0, ctx->width, ctx->height,
    NULL, NULL, 0, ctx);

And here is a part of how it's draw

BLENDFUNCTION blend = { 0 };
blend.SourceConstantAlpha = 255;
blend.AlphaFormat = AC_SRC_ALPHA;
POINT ptPos = { 0, 0 };
SIZE hSize = { ctx->width, ctx->height };
POINT ptSrc = { 0, 0 }; 

draw(ctx); // draw text and animated point
UpdateLayeredWindow(hwnd, ctx->screen, NULL, &hSize, ctx->dc, &ptSrc, 0, &blend, ULW_ALPHA);                

I do have tested on Windows 7 without any DX Wrapper and without DWM8And16BitMitigation compatibility flag and it appear that the overlay is transparent. And as far as I remember from some old test I done on Windows XP it was also transparent. I don't see why the old NeoEE devs would have done all those transparency things if it was not working on the system of the time (2013, so mostly XP/Vista/7). So if that's the way it should work you should probably keep the transparency as it is the behaviour on all Windows systems I guess.

Lobby render glitch

For some reasons the render glitch don't seems to be present anymore, I'm not sure why honestly. On 0.4.X it was really present, 0.5.0 fixed almost everything but some little square were still present but your current version seems to fix it. So good news I guess!

New issues

Sadly your changes also bring new issues:

Capture d'écran 2023-11-21 065057

Unrelated issue

I wasn't sure if I should open a new issue for this, so please let me know if another issue should be open or if I should continue on this one. I've noticed that DDrawCompat seems to delete all styles from the window, but in some cases it's not necessary to delete the style. For example, when I'm developing a patch for the game, I invoke a console window from the game process like this:

// GetConsoleWindow() return the HWND of the console, it may be used to white list it maybe
if (GetConsoleWindow() == NULL)
{
    AllocConsole();
}
...redirection stuff...

With DDrawCompat the console look like this: Capture d'écran 2023-11-21 084326 or sometimes like this: Capture d'écran 2023-11-23 085601

But it should look like this: image

narzoul commented 7 months ago

ctx->screen = GetDC(NULL);
ctx->dc = CreateCompatibleDC(ctx->screen);
HBITMAP hBitmap = CreateCompatibleBitmap(ctx->screen, ctx->width, ctx->height);

Yes, and I'm pretty sure this should create a 16-bit bitmap without an alpha channel IF the display mode is 16-bit. But now I realize the game can be set to 32-bit color too, and then the transparency works correctly (as I'd expect it). I guess you only tested it with 32-bit color mode on Win 7/XP? Note that when changing the color depth in the game, it needs to be restarted for the change to affect the overlay.

Sadly your changes also bring new issues:

  • It's now impossible to leave the game windows, doing alt tab or the windows key will keep the game on front of any windows and desktop
  • Going back in the window don't work (we have to manually select the good window from the 2 windows)
  • There is now 2 game windows (that's a really big issue as it may confuse players and streamers)

Ah, I forgot to test alt-tab. Now I remember why it needed to be an owned window: to remove it from the taskbar without using WS_EX_TOOLWINDOW (and to support taskbar preview). Using WS_EX_TOOLWINDOW again would break OBS window recording again, in addition to taskbar preview: https://github.com/narzoul/DDrawCompat/issues/231#issuecomment-1672026906

I guess there are no quick and easy solutions to solve everything. I'll think about it.

I wasn't sure if I should open a new issue for this, so please let me know if another issue should be open or if I should continue on this one. I've noticed that DDrawCompat seems to delete all styles from the window, but in some cases it's not necessary to delete the style.

I'm guessing this is related to removing visual styles with SetThemeAppProperties in DllMain. This was needed because the GDI redirection code doesn't support themes, and there were a lot of glitches with themes enabled.

Anyway, this goes back to the problem of supporting different GDI interop modes, which is not a small task, but it should be done eventually.

EnergyCube commented 7 months ago

Thanks for your answer.


I guess you only tested it with 32-bit color mode on Win 7/XP?

I just tested with 16-bit on Windows 7 and the overlay was... invisible! It was just empty, no black screen or anything, the overlay was invisible, I could move the mouse, which caused some kind of buffer issue, but when the overlay disappeared, everything went back to normal. So it seems that even if the behavior is weird, you were right, sorry for my bad test, I'll probably change the call to CreateBitmap since it should work on both 16-bit and 32-bit.


Ah, I forgot to test alt-tab. Now I remember why it needed to be an owned window: to remove it from the taskbar without using WS_EX_TOOLWINDOW (and to support taskbar preview).

In DDrawCompat 0.4.0/0.5.0/0.5.1 this is working correctly.

The issue is that with this problem the game is absolutely unplayable, most games last several hours and leaving temporary the game/lobby is very important. With this issue, the window will stay on top of everything and the player will be unable to return to the game (clicking on the game window won't work, he'll have to find the 2 windows in the taskbar and select the right one).

I'm not sure breaking WS_EX_TOOLWINDOW for OBS and taskbar preview is worth it if it literally make some game like Empire Earth unable to minimize itself. Maybe it should be a config value the time it get eventually fixed ?

What can be done? I absolutely can't distribute the wrapper in this state, even if the gameplay fixes work very well (it even manage to fix a blue screen on my AMD card when I'm playing without TnL).


I'm guessing this is related to removing visual styles with SetThemeAppProperties in DllMain. This was needed because the GDI redirection code doesn't support themes, and there were a lot of glitches with themes enabled.

I was wondering if some windows could be whitelisted to keep the style, like the one returned by GetConsoleWindow. But that's really a minor issue, nothing important it's just to know in case it would be something easy and possible that any game using DDrawCompat could get without breaking anything.

narzoul commented 7 months ago

I just tested with 16-bit on Windows 7 and the overlay was... invisible! It was just empty, no black screen or anything, the overlay was invisible, I could move the mouse, which caused some kind of buffer issue, but when the overlay disappeared, everything went back to normal. So it seems that even if the behavior is weird, you were right, sorry for my bad test, I'll probably change the call to CreateBitmap since it should work on both 16-bit and 32-bit.

Well, the documentation of BLENDFUNCTION states under Remarks that:

When the AlphaFormat member is AC_SRC_ALPHA, the source bitmap must be 32 bpp. If it is not, the AlphaBlend function will fail.

It doesn't seem to explicitly state the same anywhere for UpdateLayeredWindow, but it's probably the same. Or maybe the function itself doesn't return an error, the window just won't show up. I think it doesn't return an error on Windows 11 at least, otherwise it should be invisible with DDrawCompat too. But I didn't implement such an error check for showing the window in fullscreen mode. It's probably fine (or even better) this way.

What can be done? I absolutely can't distribute the wrapper in this state, even if the gameplay fixes work very well (it even manage to fix a blue screen on my AMD card when I'm playing without TnL).

If you just need a quick fix "urgently", I can try something. But to have a fully tested, fully working solution compatible with other games too, that can take a long time (possibly won't be ready this year).

Anyway, I found a quick and dirty solution. Apparently AttachThreadInput can also be used for detaching input queues, so I tried doing that after the ownership relationship is established, and this seems to fix the deadlock too. I found some warnings that under certain (undisclosed) circumstances, Windows might re-attach the threads automatically: https://groups.google.com/a/chromium.org/g/chromium-dev/c/_9jq1ovNF9o (see first comment from Michael Ens)

So far with limited testing, I didn't run into such problems. Alt-tabbing and even taskbar preview works. But it needs more testing to make sure it stays stable this way. You can try it here: ddraw.zip (diff.txt compared to v0.5.1)

I was wondering if some windows could be whitelisted to keep the style, like the one returned by GetConsoleWindow.

I'd rather avoid such whitelists if possible, and just remove redirection for all windows instead, possibly behind some setting. But, more on that when I get to implementing and testing such things. Maybe a setting won't even be needed. I'm not sure if there are any windowed games that really require a shared ddraw/GDI surface. Redirecting only the fullscreen window might be sufficient. This could also solve the deadlock issues, in a cleaner way.

EnergyCube commented 7 months ago

If you just need a quick fix "urgently", I can try something. But to have a fully tested, fully working solution compatible with other games too, that can take a long time (possibly won't be ready this year).

Oh no I'm sorry, I'm already very happy to see you're still working on this project! Take all the time you need 😄! It's just that I'm pretty excited about the improvements it brings and how some "stupid" blocking issues can seem to the end user that make me unable to provide it.

Anyway, I found a quick and dirty solution. Apparently AttachThreadInput can also be used for detaching input queues, so I tried doing that after the ownership relationship is established, and this seems to fix the deadlock too. I found some warnings that under certain (undisclosed) circumstances, Windows might re-attach the threads automatically:

I just tested and everything seems to work perfectly! No more issues at all.

I'd rather avoid such whitelists if possible, and just remove redirection for all windows instead, possibly behind some setting. But, more on that when I get to implementing and testing such things. Maybe a setting won't even be needed.

Noted, good to know I will just stick with it that's perfectly fine.


Given the changes you have made for the current overall issue, what will you kept for the next version and what will not? I'd like to know because I may have to apply the changes to your next versions myself before my problems are eventually fixed.

narzoul commented 7 months ago

Given the changes you have made for the current overall issue, what will you kept for the next version and what will not? I'd like to know because I may have to apply the changes to your next versions myself before my problems are eventually fixed.

If further testing doesn't reveal any problems, then everything can be kept as it is. Unless I make some more general GDI interop changes before the next release, but then probably the issues with this game will be solved regardless of what changes.

EnergyCube commented 7 months ago

Okay noted, so far for me it's perfect.


I noticed another issue but regarding input this time. The issue seems present in older DDrawCompat release too.

Here is a video of the in game camera moving with the arrow key without using any DirectX Wrapper: https://github.com/narzoul/DDrawCompat/assets/32596278/b81d7dbf-dc42-4980-a90c-36469443e880 As you can see the camera is fast, the only issue is that when there is "a lot" of units the camera is a little slower.

And here is the same test with DDrawCompat: https://github.com/narzoul/DDrawCompat/assets/32596278/d9518d0a-f5df-4fbe-ad42-ba06c745bd6b The camera is now very very slow and is also impacted by the slowdown caused by units.

Do you have any idea what might be causing this difference with the arrow inputs in DDrawCompat?

And furthermore, I don't think there's anything possible (if it's fixable, it's literally insane as it's been a major flaw of the game today), do you possibly have any idea why the game might slow down so much when displaying these units? It's pretty ridiculous, because you can quickly have over 200/300 units on screen when playing, and as you can see in the video, it only takes about 50/100 for the game to start dropping drastically in FPS (only Wine 8 manage to remain very stable with a lot of units, this should at least mean that it's possible to fix). Considering our modern hardware, this is absolutely ridiculous, some developers in the community think it might be related to some missing DX 7 features on GPU that make the game do some of the rendering using the CPU (you can check the CPU usage of the game, it's insanely hight, and literally reach 100% when there is "too much" units, causing the FPS to drop), but we don't really know how that works.

Here is the save file to get exactly the same test than me, you simply have to put the .ees file in Empire Earth\Data\Saved Games testminiperf.zip

narzoul commented 7 months ago

Do you have any idea what might be causing this difference with the arrow inputs in DDrawCompat?

I don't notice a difference in scrolling speed on my system, except that it slows down more with units on screen, which can be fixed by setting CpuAffinity=all in DDrawCompat.ini. Restricting the game to a single core natively produces similar slowdowns without wrappers.

Another thing that can affect performance is that DDrawCompat implements "proper" double-buffered page flipping, like it used to be on old systems, so if FPS is below refresh rate, then it drops to half of the refresh rate (or an even lower integer factor). You can fix this by setting VSync=off in DDrawCompat.ini.

Performance is still a little better natively, according to the NVIDIA performance overlay. I'll look into it when I have some time, but it might just be that DDrawCompat's optimizations that are useful for other games can not really do much here, and just end up hurting performance a bit.

And furthermore, I don't think there's anything possible (if it's fixable, it's literally insane as it's been a major flaw of the game today), do you possibly have any idea why the game might slow down so much when displaying these units?

I did not see anything unusual after checking a single frame's render calls. I don't think I can optimize it further, but I'm also not really a Direct3D guru.

The only thing that was a little odd to me is that the game is supposed to be using hardware T&L, but seemed to be doing only software vertex processing. I'm not sure if it really supports hardware vertex processing though, or whether it would make a significant difference.

only Wine 8 manage to remain very stable with a lot of units, this should at least mean that it's possible to fix

Wine has a complete reimplementation of ddraw/D3D to OpenGL I think, so performance is not really comparable with the native Windows implementation (which DDrawCompat also builds upon).

some developers in the community think it might be related to some missing DX 7 features on GPU that make the game do some of the rendering using the CPU

This is not really feasible with Direct3D. In order to do any rendering on the CPU, the hardware render target would have to be locked so that the CPU can draw on it, but I didn't see the game doing that. Generally this would only be used for drawing 2D stuff like interface elements anyway.

The only thing the game is doing on the CPU is vertex processing, which might be a possible culprit if the game's really supposed to support hardware vertex processing. But then I don't know why it's not using that path, it's definitely supported by any modern (and even not so modern) GPU.

narzoul commented 7 months ago

The only thing the game is doing on the CPU is vertex processing, which might be a possible culprit if the game's really supposed to support hardware vertex processing.

Hmm, that's strange. When I originally tested it with debug logs enabled, it wasn't using hardware vertex processing. But now on my further tests, it is. Both with debug logs on/off. I'm not sure what happened there. I haven't changed any settings. I can no longer reproduce it unless I force the non-T&L device in the game settings.

Anyway, hardware vertex processing is indeed a little faster, but not that significant. I get between 100-130 FPS either way when I load your save game. Hardware T&L is closer to 120+.

narzoul commented 7 months ago

Good news, I found a way to match native performance by disabling the use of dynamic index buffers. This is one of those optimizations I added a long time ago, but it turns out it was only decreasing performance after all, at least in this game. It also looks like dynamic vertex buffers don't really make a difference either, so I might as well turn those off too.

Now performance is closer to 140 FPS average upon loading your save game. If these improvements turn out to be consistent across other games and hardware, I could throw out a bunch of unnecessary code. For now I just commented out a single line in DynamicBuffer.cpp to make sure dynamic index and vertex buffers are not used: ddraw.zip (diff.txt compared to v0.5.1)

EnergyCube commented 7 months ago

I don't notice a difference in scrolling speed on my system. which can be fixed by setting CpuAffinity=all

Well as you can see on my video it's real and massive, I don't understand why you don't have it. Maybe your game is running so well that you can't notice it. I'm not sure why CpuAffinity would help, I mean the game is single core so.

You can fix this by setting VSync=off in DDrawCompat.ini.

I'm already using that setting.

The only thing that was a little odd to me is that the game is supposed to be using hardware T&L

This is something you can enable or disable in the game settings but we often don't recommand player to use T&L because it usually consume 20% more CPU and considering how much the CPU is required to get just 60 FPS... well we better don't use it. But sadly performance of T&L really depends on the hardware, we have ppl saying that one is slower or even some just crash (like me, not using T&L crash my game but with DDrawCompat it work and I have more FPS as I can recover the CPU power that would be used by T&L before).

The only thing the game is doing on the CPU is vertex processing

Well vertex processing must be really bad in the game because any model that use over 2400 vectex can crash the game (ridiculous considering our hardware).

Good news, I found a way to match native performance by disabling the use of dynamic index buffers. This is one of those optimizations I added a long time ago, but it turns out it was only decreasing performance after all, at least in this game. It also looks like dynamic vertex buffers don't really make a difference either, so I might as well turn those off too.

Are you sure about this? My performance are exactly the same with this new dll, lierally nothing changed in both T&L and not T&L.

narzoul commented 7 months ago

I'm not sure why CpuAffinity would help, I mean the game is single core so.

It helped when I tried it, but then again it was the same way with a couple of things today at first, so I'm not sure what to believe anymore. Anyway, give it a try.

Well vertex processing must be really bad in the game because any model that use over 2400 vectex can crash the game (ridiculous considering our hardware).

That's not so surprising, considering the game creates vertex buffers with only enough space for 2442 vertices, at least according to my debug logs.

But now I realize maybe you have more mods installed than I do. I kept it to a minimum and only installed NeoEE at first to reproduce your original issues, then added DrexMod 2 too when I realized your zoom level is not the same as mine. What else have you selected? Maybe you should send a screenshot so I can select the same things. Also I assume your in-game settings are mostly the defaults (which seems to max everything) except for resolution and color depth, which I assume is 1920x1080x32.

Are you sure about this? My performance are exactly the same with this new dll, lierally nothing changed in both T&L and not T&L.

I was sure about it half an hour ago, I even tested it back and forth a few times to makes sure it wasn't a fluke. But now, I can't reproduce any difference anymore either.

Instead, I noticed my game somehow changed to 16-bit color mode while I was testing (I noticed because of the missing overlay transparency), and now instead I get the same performance difference by switching between 16-bit and 32-bit color modes. I also restart the game after changing the setting to make sure it applies, but not sure if that's necessary.

EnergyCube commented 7 months ago

It helped when I tried it, but then again it was the same way with a couple of things today at first, so I'm not sure what to believe anymore. Anyway, give it a try.

I tried even if I told you that and nothing changed.

But now I realize maybe you have more mods installed than I do. I kept it to a minimum and only installed NeoEE at first to reproduce your original issues, then added DrexMod 2 too when I realized your zoom level is not the same as mine. What else have you selected? Maybe you should send a screenshot so I can select the same things. Also I assume your in-game settings are mostly the defaults (which seems to max everything) except for resolution and color depth, which I assume is 1920x1080x32.

To get the same settings as 90% of current players, you would just have to use the Community Setup without touching any settings, since you already installed it I would simmply recommand you to switch to dreXmod 3.3 as it is the default version provided. You should get the same settings that everyone including me. I know that for that kind of stuff using a perfect vanilla game is better but nowadays we all use dreXmod, I have the source code of it so even if something is bad with it I can fix it, but it should not technically.

considering the game creates vertex buffers with only enough space for 2442 vertices

We actually manage to load up to 2600 but it can make the game very unstable, maybe we can do it because the model have 3 parts, not sure. Is there any way to increase that buffer?

I was sure about it half an hour ago, I even tested it back and forth a few times to makes sure it wasn't a fluke. But now, I can't reproduce any difference anymore either.

Well, you're experiencing the awesome compatibility and performance randomness of the game that I've been trying to fix for over 5 years now ahah

For testing you should use 1920x1080 for both menu and in game, 32 bits display and test both TnL and without TnL while remembering that the setup will most probably avoid using TnL because it use way more CPU.

narzoul commented 7 months ago

To get the same settings as 90% of current players, you would just have to use the Community Setup without touching any settings, since you already installed it I would simmply recommand you to switch to dreXmod 3.3 as it is the default version provided. You should get the same settings that everyone including me. I know that for that kind of stuff using a perfect vanilla game is better but nowadays we all use dreXmod, I have the source code of it so even if something is bad with it I can fix it, but it should not technically.

I was already using the Community Setup but with the custom install setting. I changed it to dreXmod 3 now, but it didn't seem to make any difference, I get the same FPS values as before. According to the DDrawCompat stats overlay's avg flip rate, it's something like this:

No T&L: 16-bit: 103-116 32-bit: 97-102

T&L: 16-bit. 125-129 32-bit: 110-116

We actually manage to load up to 2600 but it can make the game very unstable, maybe we can do it because the model have 3 parts, not sure. Is there any way to increase that buffer?

Well, maybe the total space (2442x36 or 32, depending on T&L on or off) gets rounded up to a page boundary and you have a little more space before you either trash some other data in memory, or run into an access violation (crash).

The value is hardcoded in DX7HRTnLDisplay.dll in 2 places, the first one seems to be for system memory vertex buffers and the second for video memory vertex buffers. On my system only the second one is hit, the first is probably only some fallback code, but it might be better to replace both.

Search for C7 45 FC 8A 09 00 00 and replace 8A 09 with whatever, remember that it's little-endian. Max value is FF FF even according to the DX7 SDK, so don't override the following 00s.

Well, you're experiencing the awesome compatibility and performance randomness of the game that I've been trying to fix for over 5 years now ahah

Yes, sometimes after restarting the game, with T&L and 16-bit I get around 140 FPS instead of 125-130. So I guess indeed I just had an unlucky coincidence when testing dynamic buffers on/off and it actually didn't change anything.

Well, like I said I'm not a Direct3D expert, even less so when it comes to performance. But at least from looking at the T&L debug logs, it seems to me the game engine is reusing the same one vertex buffer for everything rendered during a frame. So if it has to render 300 of the same unit model, I'd assume it refills that buffer and passes it to the driver 300 hundred times. With software vertex processing (no T&L) it should be even worse, because all instances need to be processed by the CPU before passing them to the driver. I assume the enhanced zoom level allows more vertices to fit on screen than it was possible originally, compounding the issue.

It might give some performance improvement to store all vertex data in a bunch of constant vertex buffers instead, preferably in video memory, so it doesn't have to be reuploaded every time. But this can only be done on the game engine side, not from DDrawCompat, so I can't really help with it. I also don't know how much this would matter in the end.

the setup will most probably avoid using TnL because it use way more CPU

I don't understand this. At least in theory, TnL should consume less CPU, because it pushes vertex transformations to be done on the GPU, freeing the CPU from doing vertex computations on every single vertex first. But I haven't really looked at the differences between the TnL and non-TnL versions yet, maybe TnL does something else in a worse way.

narzoul commented 7 months ago

The value is hardcoded in DX7HRTnLDisplay.dll in 2 places, the first one seems to be for system memory vertex buffers and the second for video memory vertex buffers. On my system only the second one is hit, the first is probably only some fallback code, but it might be better to replace both.

Same value is also present in DX7HRDisplay.dll, but only once, with system memory vertex buffer configuration. Maybe this is what causes the higher CPU overhead for the TnL version, since it needs to lock a video memory vertex buffer instead. But if this is the case, then it should be fixed by the default settings in DDrawCompat, because it has a VertexBufferMemoryType=sysmem override, exactly to reduce the lock overhead in games that frequently lock vertex buffers in inefficient ways.

narzoul commented 7 months ago

Well, it turns out my assumption about the frequent locks on a video memory vertex buffer crippling T&L performance was correct. I did some more tests without wrappers, using the NVIDIA performance overlay. I also used the DXPrimaryEmulation -DisableMaxWindowedMode shim to make the game run in exclusive fullscreen mode, otherwise FPS was limited to refresh rate (except when moving the mouse), no matter if the in-game Vertical Sync setting was on or off.

All tests were done at 1920x1080 32-bit, measuring FPS for a few seconds after loading your save game, without moving the mouse.

Without T&L: ~205 FPS With T&L: ~105 FPS

Then, I edited DX7HRTnLDisplay.dll to force the vertex buffer into system memory for T&L (like the game already does without T&L). I used the following replacement: C7 45 F4 00 00 01 00 -> C7 45 F4 00 08 01 00

Now T&L performance boosted significantly to ~315 FPS. This change can also be combined with the increased vertex buffer size to prevent instabilities with vertex counts above 2442.

DDrawCompat is actually significantly slower, both with or without T&L. It's almost half without it, and less than half with it, I don't know why.

narzoul commented 7 months ago

I figured out what causes the inconsistent performance. When the game is running slow, it's redrawing some text elements using GetDC after every frame. This happens both with the modded and unmodded versions.

When it's fast, it only does this every 5 seconds with mods installed, and even less frequently (and inconsistently) without mods.

GetDC requires pulling the render target into system memory so that GDI can draw text on it, so this can tank performance a bit if it's done every frame. I don't know how to fix this in the game engine, but in theory, this could be optimized in DDrawCompat to avoid copying the whole render target back and forth. I'll play with it when I have some time.

EnergyCube commented 7 months ago

I'll do everything I can to stay calm, but I want you to know that I'm super happy right now and that the Empire Earth community is grateful for the time you're taking on those issues.

Just in case you're investing more time and need to test things out, you can activate the cheat codes when creating a game, and use the very useful one : my name is methos (instant construction and units, see the whole map and inf resources).

Search for C7 45 FC 8A 09 00 00 and replace 8A 09 with whatever, remember that it's little-endian. Max value is FF FF even according to the DX7 SDK, so don't override the following 00s.

Then, I edited DX7HRTnLDisplay.dll to force the vertex buffer into system memory for T&L (like the game already does without T&L). I used the following replacement: C7 45 F4 00 00 01 00 -> C7 45 F4 00 08 01 00

I edited both dll for the model number unlocker and only T&L with the memory fix, I edited the string of the exposed name and then I renamed both files and added them to the game directory : image It's a usefull way to do it, so feel free to do that.

Models vertex overide (performance impact & crash)

Your vertex buffer unlocker works! We've been able to add more complex 3D models to the game, which always crashed or destroyed performance! And one very interesting thing is that units don't disappear, before when there were too many units on the screen the game would glitch and not render some of them, maybe because the buffer was full or maybe it was intentional? Anyway, now the game displays literally all units and handles over 100 complex units without any problems ("complex" being 3 500 vertex) Here is the 3D model in case you plan to do some stress testing: Tsar_Tank.zip And here is a map I use with it for benchmark, on my laptop I have 40 FPS with T&L and models number unlocked: perfdestroyer2.zip

How to install and spawn it? (click to see details)
Then start a solo game from Age 10 (X) and build a tank factory, the unit is this show as the vanilla one but you should get the modded model & texture:

We also tried to load very high poly models since the limit should "technically" be 65 535 (FF FF) , so we tried to add a building with 21 768 vertex, it crashed. Then 10k, it also crashed. Then 4 211 and it also crashed and finally 3 461 (so less than the tank I just shared to you that have a little more than 3 500) and it surprisingly also crashed. So maybe building don't have the same limit than units? So we created a new unit model to see with 8k vertex and it also crashed. So from a performance point of view changing your values is perfect and it also manage to load sometimes higher vertex models but... not always? In case you may wanna check here is some additional models:

Over 8k vertex, crashing (Unit, Age 12, first tank) M1A1_Abrams.zip

Prehistoric Temple (Building, Age 1, all are crashing even the 3461 one considering the Tsar Tank is working and have over 3500 vertex that's weird) There is only a 3D model for this one, because we didn't took the time to export the texture but it should work without it, it will just look weird if it work. Prehistoric_Temple_3461.zip Prehistoric_Temple_4211.zip Prehistoric_Temple_10k.zip Prehistoric_Temple_20k.zip

I'm not sure what I'm going to say but it seems that the game has its own limitation somewhere and that the dll patched on the number of vertex might actually "just" allow the rendering plugin to wait for the game's limit. I say this because since I patched the dll the complex models run without any slowdown and if you add too many vertex it crashes without any slowdown before. Without the vertex count patch, it was clear that the game would die a slow death because the more vertices you added, the slower it got.

T&L vertex buffer into system memory

The vertex buffer into system memory for T&L doesn't seem to change anything for me and other players that tested, the thing that has improved performance is actually the vertex numer unlocker. As you can see, I have one rendering plugin with only the vertex count patched (T&L) and another with the vertex count and vertex memory (T&L). Both renderings have exactly the same performance. The CPU usage is also exactly the same, I don't see much difference anyway.

Speaking of CPU usage, when the rendering plugin isn't patched, the CPU where the game runs can reach 100% but when it is patched, it doesn't reach more than 80% even if FPS start to drop (and the GPU isn't used at 100% either), I don't know why we don't get the extra juice from those remaining % of CPU and/or GPU.

it seems to me the game engine is reusing the same one vertex buffer for everything rendered during a frame. So if it has to render 300 of the same unit model

I don't really know either, but we do know that the game does some very weird things that make it unoptimized when it could be working properly, vertex number unlocking being a very good example. Unfortunately, we have no knowledge of DirectX and even less of Direct3D 7/DirectDraw, so correcting this kind of thing, even though it could increase performance by 10, is impossible for us.

2D render issues

I figured out what causes the inconsistent performance. When the game is running slow, it's redrawing some text elements using GetDC after every frame. This happens both with the modded and unmodded versions.

I'm glad you noticed this, in fact on a lot of computers players report that the game interface slows down the game, if that's what you're talking about, it happens when you select units most of the time or when an in-game message appears in the chat overlay in the top left (press enter in game to show the input and send messages). It can also be caused by the game's FPS overlay (F11 key) or when you're under attack (the blue cross on the minimap) but it's rare compared to what I mentioned above. Perhaps maybe the entire game HUD is slowing things in general, but we don't notice it because it's always there compared to the things I've just described. If this thing is fixable, well once again it would fix one of the game's major flaws in terms of performance, it's really annoying because players with a good configuration have a stable game but suffer from massive slow down by these little 2D renders.

When it's fast, it only does this every 5 seconds with mods installed, and even less frequently (and inconsistently) without mods.

Very surprising because dreXmod don't change anything to the game render, the only things related to it may be the score HUD on top of the minimap, the 32b display check bypass (because the game usually summon window to test it, from Windows 8 dreXmod just return true to the game fonction DX7Screen::Test32BitSupport or maybe just the camera that is way more high with dreXmod.

DDrawCompat performance differance

DDrawCompat is actually significantly slower, both with or without T&L. It's almost half without it, and less than half with it, I don't know why.

Everything I said in this message has been done without DDrawCompat (that's why I can't test the non T&L version because natively I have a bluescreen if I try to load a game with it). I done one of the stress test and it appear that it make me indeed loose between 30-50% of the performance, that's really sad, but I did have notice something strange, the CPU usage look like a rollercoaster. Combining DDrawCompat that fix compatibility issue while having those new "performance fixed" dll would have been awesome. Also I can still confirm you that I have that weird input slowdown for camera mouvement and I have noticed that map loading is almost instant with DDrawCompat that's insane.

Exclusive fullscreen & VSync

I also used the DXPrimaryEmulation -DisableMaxWindowedMode shim to make the game run in exclusive fullscreen mode, otherwise FPS was limited to refresh rate

Adding that flag don't make me able to get more FPS than 60 (my refresh rate), there is apparently no way to unlock it from what I see, on DDrawCompat 0.4.0 the overlay allowed me to disable it and it actually worked but now in 0.5.1 it's not here anymore and for some reason even adding it to the config VSync=off don't seems to work. So I'm trapped with my refresh rate, if there is any way to fix this I can create a patch for the game.


I'd like to thank you once again for your efforts, as this discovery motivates our modders to create new content and will make a lot of players able to play more fluidly! Sorry if the message is too large I hope most things are clear.

narzoul commented 7 months ago

And here is a map I use with it for benchmark, on my laptop I have 40 FPS with T&L and models number unlocked

My numbers on a desktop PC with Core i7 12700 + NVIDIA GeForce RTX 3060 Ti:

Direct3D Original: 57 Direct3D Vertex: 57

Direct3D Original TnL: 18 Direct3D OnlyVertex TnL: 18 Direct3D OnlyMem TnL: 170 Direct3D Vertex&Mem TnL: 170

The vertex buffer into system memory for T&L doesn't seem to change anything for me and other players that tested, the thing that has improved performance is actually the vertex numer unlocker. As you can see, I have one rendering plugin with only the vertex count patched (T&L) and another with the vertex count and vertex memory (T&L). Both renderings have exactly the same performance.

Well it's the opposite for me, the vertex number doesn't change performance, but the memory fix does. Are you testing on integrated GPUs? You mentioned you had a laptop, but not the specs. Even if you have a dedicated NVIDIA GPU in it, it might be using only an integrated Intel chip, see here: https://nvidia.custhelp.com/app/answers/detail/a_id/3733/~/games-based-on-directx-8-and-older-versions-of-directx-will-only-run-on You could check by looking at the "Hooking user mode display driver" line in DDrawCompat logs. If it loads igdumdim32.dll, then it's using an integrated Intel GPU.

I'm not sure why the vertex change helps anyway, unless it's maybe already running into exceptions due to buffer overflow, which are getting suppressed by some shim (e.g. IgnoreExceptions) which just slows things down instead of causing a crash. The sysmem/vidmem distinction possibly doesn't make much difference on integrated GPUs, since practically those are already using shared system memory as "video memory".

On dedicated GPUs, I imagine that the frequent vertex buffer locks are not only a problem because of the buffer residing in video memory, thus possibly needing transfer between system and video memory via PCI-E, but also because each lock requires the GPU to flush the previous rendering commands and wait for them to complete, so that the CPU can overwrite the vertex buffer contents without affecting any previously issued rendering. Dedicated GPUs don't like to work in small, synchronized batches like that.

I'm not sure what I'm going to say but it seems that the game has its own limitation somewhere and that the dll patched on the number of vertex might actually "just" allow the rendering plugin to wait for the game's limit.

Yes, it's quite likely there are other limitations in the engine. It's filling those vertex buffers with data from somewhere else, and those source buffers themselves could also be limited. I found the 8A 09 00 00 number in quite a few other places, maybe some of them are related. I'm not really into modding, but I'll check it if I have some time, maybe it's easy to tell what other numbers would need to be updated.

I'm glad you noticed this, in fact on a lot of computers players report that the game interface slows down the game, if that's what you're talking about,

During my last debug run it was outputting the following texts during each GetDC cycle:

4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 905, 0, 4, *{600,0,1319,30}, "Dark Age", 8, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 905, 0, 4100, *{600,0,1319,30}, "Dark Age", 8, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 905, 0, 4100, *{600,0,1319,30}, "Dark Age", 8, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 905, 0, 4, *{600,0,1319,30}, "Dark Age", 8, null) = 1
4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 1462, 987, 4, *{1450,987,1549,1012}, "125/250", 7, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 1462, 987, 4100, *{1450,987,1549,1012}, "125/250", 7, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 1462, 987, 4100, *{1450,987,1549,1012}, "125/250", 7, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 1462, 987, 4, *{1450,987,1549,1012}, "125/250", 7, null) = 1
4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 1249, 987, 4, *{1248,987,1345,1012}, "96850", 5, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 1249, 987, 4100, *{1248,987,1345,1012}, "96850", 5, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 1249, 987, 4100, *{1248,987,1345,1012}, "96850", 5, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 1249, 987, 4, *{1248,987,1345,1012}, "96850", 5, null) = 1
4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 1045, 987, 4, *{1044,987,1141,1012}, "100000", 6, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 1045, 987, 4100, *{1044,987,1141,1012}, "100000", 6, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 1045, 987, 4100, *{1044,987,1141,1012}, "100000", 6, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 1045, 987, 4, *{1044,987,1141,1012}, "100000", 6, null) = 1
4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 840, 987, 4, *{839,987,936,1012}, "100000", 6, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 840, 987, 4100, *{839,987,936,1012}, "100000", 6, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 840, 987, 4100, *{839,987,936,1012}, "100000", 6, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 840, 987, 4, *{839,987,936,1012}, "100000", 6, null) = 1
4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 635, 987, 4, *{634,987,731,1012}, "98875", 5, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 635, 987, 4100, *{634,987,731,1012}, "98875", 5, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 635, 987, 4100, *{634,987,731,1012}, "98875", 5, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 635, 987, 4, *{634,987,731,1012}, "98875", 5, null) = 1
4f98 21:55:47.930 > ExtTextOutA(*DC{&AC0126B1,null}, 431, 987, 4, *{430,987,527,1012}, "96850", 5, null)
4f98 21:55:47.930   > ExtTextOutW(*DC{&AC0126B1,null}, 431, 987, 4100, *{430,987,527,1012}, "96850", 5, null)
4f98 21:55:47.930   < ExtTextOutW(*DC{&AC0126B1,null}, 431, 987, 4100, *{430,987,527,1012}, "96850", 5, null) = 1
4f98 21:55:47.930 < ExtTextOutA(*DC{&AC0126B1,null}, 431, 987, 4, *{430,987,527,1012}, "96850", 5, null) = 1

Those look like the text at the top of the screen, and the resource numbers at the bottom. But yes, there are probably others too that can be added depending on UI state or other actions.

The usual solution for faster locking of the render target for such CPU updates is to use a lower in-game resolution, and rely on the resolution scaling of wrappers to upscale the 3D parts instead. This decreases the amount of data that needs to be transferred between system and video memory, since only the internal low-resolution image needs to be transferred. Of course, it also decreases the quality of whatever is drawn using the CPU (e.g. text in this case), and possibly changes the UI size if the game doesn't scale it with resolution. This is often a good thing, since the UI won't become too small at large resolutions.

In NeoEE, it looks like there are some texts that get very poor quality when running at low resolutions like 800x600, however. I guess the mod was only made with full HD and higher resolutions in mind.

When I mentioned GetDC earlier, I thought only text was rendered on the CPU, but you mentioned a few other things, and now I'm no longer sure if all of them can be optimized, even from a wrapper. I'll have to check how those other things are drawn first.

Adding that flag don't make me able to get more FPS than 60 (my refresh rate), there is apparently no way to unlock it from what I see

Is this with or without DDrawCompat? The wrapper always limits display FPS to refresh rate, so you won't get more on some overlays with it, depending on what is measured. But in DDrawCompat's own stats overlay, you can see the full flip rate, which can be higher than the present rate ("Display FPS").

Also, I assume you added the shim (DXPrimaryEmulation -DisableMaxWindowedMode) with Compatibility Administrator, just to make sure we're talking about the same thing. Note that DDrawCompat overrides this as well with its own FullscreenMode setting.

The shim (or FullscreenMode=exclusive in DDrawCompat) is needed because otherwise DWM is doing the presentation, which itself is also limited to the refresh rate. In exclusive mode, the in-game Vertical Sync should also work fine, at least it does for me (without DDrawCompat).

on DDrawCompat 0.4.0 the overlay allowed me to disable it and it actually worked but now in 0.5.1 it's not here anymore

You can scroll the overlay with the vertical scroll bar on the right side, or with the mouse wheel.

EnergyCube commented 7 months ago

Oh you just posed a message in the same time than me. I just delete the one I send at the same time than you and joined it to this message to keep a chronology, I'm gonna read your message rn.


After more test from the modding community it apper that my guess was kinda right.

it seems that the game has its own limitation somewhere and that the dll patched on the number of vertex might actually "just" allow the rendering plugin to wait for the game's limit

The game doesn't crash if you don't exceed 2442 vertex per parts, so the way to add a unit with 5000 vertex is to divide the 3D model. This explains why the one with 3500 vertices works: none of the parts of it had more than 2442 vertex.

Here is the tank that was crashing but this time with a 3D model splitted : M1A2_Abrams_5772.zip It's working, without the render dll patched the game run at 1-5 FPS and with the render dll patched it run way better.

Here is also a very detailed building with over 10k vertex (accessible to citizen by passing 1 or 2 ages, or by placing it in the scenario editor): Colosseum_replacement_10362_vertex.zip

Or an even more complex one with 40k vertex (age 1): Prehistoric_Temple_46392_vertex.zip

With 100 units of the tank I've just shared (M1A2_Abrams_5772.zip), i.e. around 570,000 vertexes on screen, I'm getting around 30 FPS. My game starts to slow down with my VSych from 30 units (so about 171,000 on screen), do you "consider" this normal or is there possibly other things to do to get better performance. The game is an RTS, so if you make units with more vertices, you can quickly get a lot of vertices on the screen. Also, for some reason my game starts to slow down after 9 temples (Prehistoric_Temple_46392_vertex.zip), whereas it's stable at 60 FPS with 8, which means 371,136 vertices. I don't understand why such a high vertex difference is present to lower FPS, given that the 2 are idle during my test.

EnergyCube commented 7 months ago

You mentioned you had a laptop, but not the specs

I'm using an AMD CPU with integreated graphics, it's a Ryzen 4700U with a Vega 7 APU.

Result for perfdestroyer2: Direct3D Original: Blue screen Direct3D Vertex: Blue screen

Direct3D Original TnL: 8 Direct3D OnlyVertex TnL: 10 🤡 Direct3D OnlyMem TnL: 50 Direct3D Vertex&Mem TnL: 50

I'm really sorry, to be sure I'm right, I patched everything from scratch and it looks like I probably made a mistake, my previous test of "OnlyVertex" was actually patched with "Vertex&Mem". So you're absolutely right, changing the vertex limit didn't change anything, it's even useless as far as I can see. It really confuses me about game performance and vertex limiting, so... That would mean that without TnL you're actually 3x slower, which is unbelievable from my point of view because that's what's always worked best in most cases. I would have eventually accepted that this patch would improve TnL a little but not some kind of x3!

Yes, it's quite likely there are other limitations in the engine. It's filling those vertex buffers with data from somewhere else, and those source buffers themselves could also be limited. I found the 8A 09 00 00 number in quite a few other places, maybe some of them are related. I'm not really into modding, but I'll check it if I have some time, maybe it's easy to tell what other numbers would need to be updated.

I assume you're talking about other instances in the game's memory, aren't you? Maybe it's a buffer size that's already defined in different places in the game. The low-level engine is also responsible for a lot of things for the game, so it may have some of the logic inside (I'm not sure, I'm just saying). But patching this would not really fix the performance, it's probably useless to dive into this if the game engine or render can't follow.

In NeoEE, it looks like there are some texts that get very poor quality when running at low resolutions like 800x600, however. I guess the mod was only made with full HD and higher resolutions in mind.

This is the basic game, NeoEE doesn't change a thing, it just brings dreXmods that patch the game menu resolution to a given size (desktop). But that's all, the size of the in-game (inc. menu, hud, etc... isn't patched by dreXmod or NeoEE, it's entirely native. It just look like it did 20 years ago. You can remove temporary dreXmod.dll to see how the game handle resolution and you will see.

When I mentioned GetDC earlier, I thought only text was rendered on the CPU, but you mentioned a few other things, and now I'm no longer sure if all of them can be optimized, even from a wrapper. I'll have to check how those other things are drawn first.

I hope there is something wrong and easy to fix but sadly I guess that some part of the game 2D render logic that suck... Anyway if there is anything that could fix those 2D render issue it would be awesome.

narzoul commented 7 months ago

My game starts to slow down with my VSych from 30 units (so about 171,000 on screen), do you "consider" this normal or is there possibly other things to do to get better performance. The game is an RTS, so if you make units with more vertices, you can quickly get a lot of vertices on the screen.

I don't know what is considered "normal", I'm not really a game developer, and I only deal with Direct3D as part of DDrawCompat on a hobby level. But I think having that many vertices at the default (furthest) zoom level might be overkill. I think this is usually managed via Level of detail so the vertex/polygon count should ideally depend on the zoom level, but I guess that's not implemented in the original game.

Vertex count alone can also be a bit misleading, polygon count should matter too. The same vertex can be reused by many different polygons. When using indexed primitives (e.g. DrawIndexedPrimitive instead of DrawPrimitive), then reused vertices are generally only transformed once and the transformed version is reused between polygons sharing the same vertex AFAIK.

menu, hud, etc... isn't patched by dreXmod or NeoEE, it's entirely native

The native elements are ok at lower resolutions too. I meant that there are some texts (apparently introduced by dreXmod) that look bad at 800x600, e.g. above the minimap and in the bottom left corner with the black backgrounds. These aren't there at all without dreXmod. Here are some comparisons with dreXmod at different resolutions:

800x600: 800x600mm 800x600bl

1920x1080: 1920x1080mm 1920x1080bl

EnergyCube commented 7 months ago

Vertex count alone can also be a bit misleading, polygon count should matter too. The same vertex can be reused by many different polygons. When using indexed primitives (e.g. DrawIndexedPrimitive instead of DrawPrimitive), then reused vertices are generally only transformed once and the transformed version is reused between polygons sharing the same vertex AFAIK.

The game use CEM v2, that does use vertex and texture index buffers, so it would make sense for it to use indexed primitives. The CEM v2 only allow triganles polygon btw. The game does not differentiate between flat shading and smooth shading. CEM uses vertex normals instead of face normals, meaning everything is smooth shaded

I think this is usually managed via Level of detail so the vertex/polygon count should ideally depend on the zoom level, but I guess that's not implemented in the original game.

Indeed, but as the base camera is very close to the ground, it's possible that they didn't implement this because it was "unnecessary". Perhaps if there's a way to define it using Direct3D calls or by modifying the rendering dll on the fly according to the camera's height, I can probably make a patch for this. But CEMv2 use 10 LOD levels so it would make sens that the game use it, but I have no way to confirm it.

The native elements are ok at lower resolutions too. I meant that there are some texts (apparently introduced by dreXmod) that look bad at 800x600, e.g. above the minimap and in the bottom left corner with the black backgrounds. These aren't there at all without dreXmod. Here are some comparisons with dreXmod at different resolutions:

Oh yes, that's dreXmod but don't waste time with it, it's not important (unless it reduces game performance, in which case I can investigate the code to try to fix it but we never had any report about it). The 2D render slowdown is caused by the things I mentionned in my previous messages.

So I may have sent out too many useless messages and information lately, sorry. We so never get help with our performance and rendering problems that I may have written too much.

To sum up, DDrawCompat seems to work fine with Empire Earth, in the sense that there are no more render problems or deadlocks. But even if it corrects some rendering problems, it is unfortunately less efficient than a native runtime. You've found a way to make the T&L rendering dll much more efficient and it's something I'll be providing by default in the next update, that's for sure.

The remaining topics would be:

And the remaining extended topics (not directly related to DDrawCompat but for which help could be a game-changer for us if you find something):

narzoul commented 6 months ago

Perhaps if there's a way to define it using Direct3D calls or by modifying the rendering dll on the fly according to the camera's height, I can probably make a patch for this.

Direct3D doesn't have a concept of this type of LOD, it has to be implemented in the game engine. It just renders however many polygons are thrown at it. It's up to the engine to feed fewer polygons to Direct3D based on its own heuristics.

  • Camera mouvement are slow when using direction key

Probably tied to the perf issues, since I couldn't find a way to reproduce it other than by limiting FPS.

  • DDrawCompat is slower than native in some cases

I found a few optimization possibilities, but it's still not the same as native. I've made hundreds of test runs already while tweaking things and it seems quite impossible to track down the rest, since I often can't even tell if I made any progress at all due to the performance variability between tests even with the same version of the code (or without any wrappers).

Anyway, a few possibilities:

The game don't list all GPU, on device with Intel HD Graphics and NVIDIA, they can only use Intel, making the performance/compatibility really bad (that's an exemple, because actually my HD Graphics 3000 is faster than my RTX 2060 in the game...)

This is related to what I linked earlier: https://nvidia.custhelp.com/app/answers/detail/a_id/3733/~/games-based-on-directx-8-and-older-versions-of-directx-will-only-run-on

I don't think I can do anything about it. If I understand correctly (based on this: https://learn.microsoft.com/en-us/windows-hardware/drivers/display/rendering-on-a-discrete-gpu-using-cross-adapter-resources), it involves a lot of tasks in the runtime itself, which was probably never implemented in DirectDraw and Direct3D 8.

Maybe it would be possible to hack something similar into DDrawCompat, but without access to such a hybrid laptop myself, I couldn't even attempt to do it, even if I had time for something like that.

The only solution currently is to use a wrapper that wraps to DX9 or newer, I think.

2D render slowdown entire game render

This might be solvable in DDrawCompat, but it's very complex and I didn't have time to start working on it yet. Most likely won't have time in the near future either, partly due to the large influx of other issues that popped up recently.

Anything game performance related at this point

By some coincidence during all the testing, I found that Empire Earth.exe calls Sleep(1) after each frame is rendered, which wastes 1-2 milliseconds from the next frame time. It's easy enough to replace it with Sleep(0) for some additional FPS boost. Technically, the call could also be removed, but maybe it's better to leave it there as a potential thread switching point in single core scenarios to avoid choking other threads.

The sequence to search for is this: 6A 01 6A 01 8B CE FF 50 10 6A 01 Replace the last 01 with 00.


I'm attaching a test version with the aforementioned changes: ddraw.zip (diff.txt compared to v0.5.1)

It also includes the following temporary changes:

This version gives me around ~450 FPS, while native is ~550, according to NVIDIA Performance Overlay, assuming the UI is not constantly redrawn.