luciusDXL / TheForceEngine

Modern "Jedi Engine" replacement supporting Dark Forces, mods, and in the future Outlaws.
https://TheForceEngine.github.io
GNU General Public License v2.0
976 stars 71 forks source link

More CPU activity (even with OpenGL renderer) than dosbox #402

Open MIvanchev opened 5 months ago

MIvanchev commented 5 months ago

Hi there, I wanted to replay the DF campaign and figured out the best way to go ahead would be TheForceEngine (great project btw). However on my Thinkpad with iGPU it runs (in every configuration) slower than DF in DosBox. Even with the OpenGL renderer I experience 50% CPU activity and the fan doesn't stop blowing where as dosbox never turns the fan on. Any advice on this? I'm playing on Linux.

MIvanchev commented 5 months ago

Did some research, the CPU usage spikes immediately after the program is launched, on the 2D menu. top shows 3 processes, 1 with very high CPU activity and 2 with ~5%.

Gerwin2k commented 5 months ago

It was told somewhere in this repo that VSync enable would also schedule CPU-idling. But that did not work for my WIN32 environment. So a few months ago I hacked up this bit below , that works OK for the time being. Should be cross-platform. Result is a total of 30% use of 4-core CPU, spread evenly across the four cores.

file: TFE_System/system.cpp

    void sleep(u32 sleepDeltaMS)
    {
        if (sleepDeltaMS==0) SDL_Delay(1);
        else SDL_Delay(sleepDeltaMS); 
    }
MIvanchev commented 5 months ago

Yes, you appear to be correct, the main loop of TFE seems to do busy waits so the CPU is maxed out all the time... This seems to be a common SDL issue. Maybe a long term solution would involve SDL_WaitEventTimeout.

luciusDXL commented 5 months ago

So you mean that it busy waits on your system even with the frame limiter enabled. The idea is to avoid wasting too much time but to give other threads a chance to run. But Windows has always been a bit iffy here - so I will try out your idea and see how well it works (with the minor change of specifying the real value in frameLimiter_end() rather than changing the value in the function).

luciusDXL commented 5 months ago

So it turns out using the 1ms sleep reduced the accuracy of the frame limiter too much, but with some minor modifications I was able to restore it. The change has been submitted and will be in the next release.

MIvanchev commented 5 months ago

@luciusDXL You also need to address the midi thread, it maxes out a thread as well and doesn't even do sleep(0) just loops continuously.

Gerwin2k commented 5 months ago

Thanks luciusDXL. I will try it when I have some time to spare.

I was slightly involved with a similar adjustment in another game port, some months ago. There it was implemented like this:

https://github.com/LostArtefacts/TR1X/pull/1002/files

luciusDXL commented 5 months ago

@luciusDXL You also need to address the midi thread, it maxes out a thread as well and doesn't even do sleep(0) just loops continuously.

I saw your PR and do plan on going through that in the near future. But this was a fairly simple change, so I wanted to at least improve the frame limiter issue before the next release (which should be any day now).

luciusDXL commented 5 months ago

Thanks luciusDXL. I will try it when I have some time to spare.

I was slightly involved with a similar adjustment in another game port, some months ago. There it was implemented like this:

https://github.com/LostArtefacts/TR1X/pull/1002/files

I had considered that approach, but wasn't sure about the quality of the timing, which is why I had the sleep(0) originally. But its probably better than I'm thinking, so I will give it a shot after the next release is out. Thanks for the link and feedback. :)

luciusDXL commented 5 months ago

Anyway, the plan is to keep this issue open until improvements are made to the midi thread, as mentioned above.

Gerwin2k commented 5 months ago

I just tried the recent commit. Since it was very simple to implement. https://github.com/luciusDXL/TheForceEngine/commit/273dd41adcaa41d871ddc18ff53f71c6a9a7cfe7 That "DBL_EPSILON + 0.001;" reminds me of the +0.1 I did here. https://github.com/LostArtefacts/TR1X/issues/985

With the recent commit I get on average 45% total CPU usage among 4 cores. Regular drops from 60 FPS to 59. Sometimes 58.

My earlier hackfix: 30% total CPU usage among 4 cores. Less often it drops from 60 FPS to 59. Never drops to 58. Maybe because my hackfix catches some sleep(0) calls from other places, which it converts to SDL_Delay(1)?

Gerwin2k commented 5 months ago

Two other tests, WIN32:

pull request by MIvanchev frameLimiter.cpp + midiPlayer.cpp https://github.com/luciusDXL/TheForceEngine/pull/405 On average 27% total CPU usage among 4 cores. Regular drops from 60 FPS to 59. A few times 58.

Pull request by MIvanchev, but only midiPlayer.cpp / Old framelimiter.cpp / My hackfix for sleep() in system.cpp. On average 7% total CPU usage among 4 cores. Less often it drops from 60 FPS to 59. Never drops to 58.

Edit: That last combi seemed like a keeper. But there was a hanging note in the midi music after 5 minutes of play.

MIvanchev commented 5 months ago

Nice testing @Gerwin2k. The main thread is actually a bit of a debate terrain. My personal conviction is that the limiting shouldn't block events (i.e. sleep / SDL_Delay vs SDL_WaitEventTiemout) but there are numerous counterexamples and I think it's subjective. Anyhow, in my case it's not a significant issue because vsync already ensures a 60 FPS limit. The midi thread is another story and it's easier to fix. I'm surprised you're getting a hanging note because there's detection logic for that.