luciusDXL / TheForceEngine

Modern "Jedi Engine" replacement supporting Dark Forces, mods, and in the future Outlaws.
https://TheForceEngine.github.io
GNU General Public License v2.0
946 stars 71 forks source link

Avoid busy waiting on the main and the midi threads. #405

Open MIvanchev opened 2 months ago

MIvanchev commented 2 months ago

Solves #402.

Hi, I would like to suggest the following changes that dramatically improve the CPU usage by avoiding busy waiting in the threads. For the main thread I suggest using SDL_WaitEventTimeout in the frame limiting which pauses the thread until events arrive because I believe it's important to process events immediately. So if your mouse is, say, 1200Hz you will not be limited to 60FPS.

For the MIDI thread I propose a command condition which also awakes the thread immediately. Otherwise the thread is paused until it's time to play another note.

For me this brings down CPU usage from 30% to 10% and the fan stays off the whole time.

luciusDXL commented 2 months ago

I have seen this and plan on going through it after the next release is out (any day now). Thanks for the suggestions and PR. :)

mlauss2 commented 2 months ago

Ooh, this is pretty great. With this applied, CPU usage went from 110% down to 6% with vsync enabled. @MIvanchev: it needs a rebase, but otherwise looks fine to me!

MIvanchev commented 2 months ago

@mlauss2 Let me rebase, running your CPU at 110% long term is not a good idea!!!

MIvanchev commented 2 months ago

@mlauss2 I've rebased.

mlauss2 commented 1 month ago

I tried the frame limiter changes now as well. When not moving mouse/using keyboard, the framerate is about 5% off, when moving the character it goes up to 120, and the whole thing is really choppy when target framerate is set to 30fps. I suspect that using SDL_WaitEventTimeout() is an unsuitable wait function since it advances the game prematurely whenever input activity is detected, which the frame limiter in its current design wants to actually prevent.

I'd say drop the frame limiter changes for now; the midi thread parts are great and should go in asap as they make a world of difference wrt. cpu usage.

MIvanchev commented 1 month ago

Thanks for all your testing!

When not moving mouse/using keyboard, the framerate is about 5% off, when moving the character it goes up to 120, and the whole thing is really choppy when target framerate is set to 30fps.

Yeah that limiter was not that good, let's kick it out.

mlauss2 commented 1 month ago

@MIvanchev please redo the patch with the frame limiter bits removed. Thanks!

MIvanchev commented 1 month ago

I haven't forgotten @mlauss2, just need some time to do it :) Let me get to it ASAP.

MIvanchev commented 4 days ago

I've changed the code a bit, maybe you friends can check!

mlauss2 commented 4 days ago

gcc says:

midiPlayer.cpp:530:51: error: no matching function for call to 'max(<brace-enclosed initializer list>)'
  530 |                                 timeout = std::max({0, timeout});
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:257:5: note: candidate: 'template<class _Tp> constexpr const _Tp& std::max(const _Tp&, const _Tp&)'
  257 |     max(const _Tp& __a, const _Tp& __b)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:257:5: note:   candidate expects 2 arguments, 1 provided
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:303:5: note: candidate: 'template<class _Tp, class _Compare> constexpr const _Tp& std::max(const _Tp&, const _Tp&, _Compare)'
  303 |     max(const _Tp& __a, const _Tp& __b, _Compare __comp)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algobase.h:303:5: note:   candidate expects 3 arguments, 1 provided
In file included from /usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/algorithm:61,
                 from midiPlayer.cpp:15:
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5732:5: note: candidate: 'template<class _Tp> constexpr _Tp std::max(initializer_list<_Tp>)'
 5732 |     max(initializer_list<_Tp> __l)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5732:5: note:   template argument deduction/substitution failed:
midiPlayer.cpp:530:51: note:   deduced conflicting types for parameter '_Tp' ('int' and 'double')
  530 |                                 timeout = std::max({0, timeout});
      |                                           ~~~~~~~~^~~~~~~~~~~~~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5742:5: note: candidate: 'template<class _Tp, class _Compare> constexpr _Tp std::max(initializer_list<_Tp>, _Compare)'
 5742 |     max(initializer_list<_Tp> __l, _Compare __comp)
      |     ^~~
/usr/lib/gcc/x86_64-pc-linux-gnu/15/include/g++-v15/bits/stl_algo.h:5742:5: note:   candidate expects 2 arguments, 1 provided

same for next line with std::min(). I think gcc expects a comparison function as 2nd parameter.

Why so complicated though, this here works well too, both clang and gcc are happy with it (no overflow/truncation warnings):

                                u64 sleeptime = (s_midiCallback.timeStep - s_midiCallback.accumulator) * 1000;
                                u32 timeout = sleeptime >= SDL_MUTEX_MAXWAIT ?: (u32)sleeptime;
                                SDL_CondWaitTimeout(s_midiCmdCond, s_mutex, timeout);
MIvanchev commented 3 days ago

I'm not sure if infinities and negative values are undefined behavior in that case, otherwise a good solution unless someone changes the types of timeStep/accumulator. In general, C++ has facilities to make the code a little bit safer but they don't seem to be well supported 🤭

MIvanchev commented 3 days ago

OK the deduction should work now, it's hard to test for me RN cuz I'm on a very limited PC... it seems to work with GCC.

mlauss2 commented 2 days ago

OK the deduction should work now, it's hard to test for me RN cuz I'm on a very limited PC... it seems to work with GCC.

Yes both clang and gcc are now happy.

mlauss2 commented 2 days ago

@luciusDXL please consider to include this PR into the next release. It makes a world of difference wrt. CPU usage.