mupen64plus / mupen64plus-core

Core module of the Mupen64Plus project
1.28k stars 257 forks source link

Compiling using makefile generates slow executable #1087

Open Morilli opened 1 month ago

Morilli commented 1 month ago

I'm compiling on windows using standard msys/mingw tools and I noticed that compiling the core using the makefile actually results in a slower executable than by using the provided vcxproj and compiling with MSVC tools. I compiled both using no modifications to the provided files, aka standard (release|x64) build for the vcxproj and a simple make all for the makefile.

Some data: Using the makefile with gcc: ~33fps Using the vcxproj with MSVC: ~75fps Using the vcxproj with clang-cl (from mingw): ~55fps

I tested this with the provided 2.6.0 bundle and the GlideN64 video plugin (to get fps display) and replacing the core dll with the respective compilation outputs. I also set the core type to Cached Interpreter for this test.

There is no way that

  1. msvc actually generates faster code than gcc
  2. a compiler generates code that is more than twice as fast as another

Any ideas on what could be causing this difference? I find it very hard to believe that msvc actually just compiles that much faster code.

Jj0YzL5nvJ commented 1 month ago

MSVC is "smarter"... it automatically enables SIMD extensions that the target OS will use as a prerequisite.

You can do something similar for Makefile by defining your own OPTFLAGS. You can enable individual extensions or profiles already defined for said end.

https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html

Interprocedural optimizations are more trickster and I don't understand it... https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html

Examples: https://github.com/mupen64plus/mupen64plus-core/blob/master/.github/workflows/scripts/ci_build.sh#L74 https://github.com/simple64/simple64/blob/main/mupen64plus-core/CMakeLists.txt#L126-L137

Morilli commented 1 month ago

Thanks for the suggestions. I tried setting OPTFLAGS="-Ofast -flto -march=native -mtune=native", but I'm not seeing any difference in runtime speed. I see that the options are applied to the compilation with V=1, but apparently they don't affect the outcome in any significance. I have also tried to just spam some more options -msse4 -mavx -mavx2 -msse4.2 -msse, but that doesn't seems to do anything either.

Jj0YzL5nvJ commented 1 month ago

I don't think the differences are really visible with dynarecs... the differences are more noticeable when using interpreters (--emumode 0 / --emumode 1) and RDP's that support multi-thread, but ui-console is not optimized for multi-thread in general. Even so, make sure to disable any type of vsync and use "Speed ​​Limiter Toggle" for your benchmark. Conker's Bad Fur Day intro is a excellent trial test.

Morilli commented 1 month ago

I was testing with the cached interpreter (emumode 1) before, but set it to 0 now. I've also disabled the speedlimiter, although I've always measured fps with fast-forward active (holding F), so it probably doesn't matter. vsync should also not be active, otherwise I wouldn't be able to fast-forward at all.

I've tried with the game you suggested and the results are even worse: Using the gcc-compiled core, I get between 15 and 20 fps in the intro. When using a self-compiled msvc core, I get between 50 and 60 fps.