luciusDXL / TheForceEngine

Modern "Jedi Engine" replacement supporting Dark Forces, mods, and in the future Outlaws.
https://TheForceEngine.github.io
GNU General Public License v2.0
946 stars 71 forks source link

[RFC] Linux: Make GCC LTO (Link Time Optimizations) work #411

Open mlauss2 opened 1 month ago

mlauss2 commented 1 month ago

While building with -flto and gcc on Linux does succeed, the resulting binary crashes at seemingly random places. This set of 3 patches removes all warnings emitted by GCC during the LTO run, and the resulting binary actually works fine as well (played a few levels, mods, saved/loaded savegames from both the lto-patched and vanilla binary, no issues whatsoever).

Comparison of binary size (-O3 -march=znver4):

   text    data     bss     dec     hex filename
5754171   84401 3355328 9193900  8c49ac theforceengine-gcc15
4317669   48488 3215256 7581413  73aee5 theforceengine-gcc15-lto

Size reduction of 1.54MiB or ~17,5%

The thing that LTO was most confused by is apparently the wraparound-to-nullptr design in the TFE_Jedi Allocator, I changed it a bit to use regular nullptrs. The rest is namespace cleanups to get rid of C++ ODR violations which only appear with lto enabled, and nullpointer checks to call allocator_newItem() callsites (which fixes an lto-only allocator pointer corruption encountered with dark tide 1 mod).

Clang LTO also works, but generates a 10% bigger binary (compared to non-lto build).

What do you think?

luciusDXL commented 2 days ago

Seems useful. This mostly touches the reverse-engineered code (including the Jedi allocator). I went through the changes, but might need to patch and test it for a bit to be sure. :)

mlauss2 commented 2 days ago

I did build and run test it on windows (together with the other PRs), and it does run fine there as well. I don't know whether MSVC supports lto though :)

luciusDXL commented 2 days ago

I did build and run test it on windows (together with the other PRs), and it does run fine there as well. I don't know whether MSVC supports lto though :)

I'm not too worried about that, as long as the code compiles and runs correctly (and with the same performance). After some additional testing, I will probably merge it in after the next release (sometime this week).