Open na-na-hi opened 9 months ago
It seems like the culprit is […] the precompiled object files linked into the final binary
You guessed correctly: The runtime contains SSE2 instructions, including mingw32, mingwex, libgcc, and libstdc++. If you -march=i686 and do not link these runtime objects, then your binary will not have SSE2. The kit includes a substantial program that can be built this way in case you wanted another test (src/pkg-config.c). Compiled with -march=i386, it even works on ancient machines running Windows NT 3.51.
My reason for doing this is that, at least for GCC-generated code, SSE2 is substantially faster than x87 — orders of magnitude faster — especially in runtime math routines. It's night and day. That's why I even thought to do it. I wondered why some of my 32-bit builds were so slow and -march=native didn't help.
SSE2 hit the market nearly 23 years ago, predating Windows XP, so it seems like a bad trade-off to leave this performance on the table for everyone running hardware younger than 20 years old just to support some special cases. For the same reason you can't turn it off in the runtime, had I not done it this way then they couldn't turn it on in some performance critical parts of their program.
You mentioned Windows NT, but I've believe the runtime libraries (aside from libgcc) do not reliably support it anyway, so you couldn't use them even if they were compiled for older targets. I patch Mingw-w64 to support as far back as Windows XP.
The good news is that if you have a special case, it's easy to build a custom w64devkit for it. Apply variant-i686.patch (or start from an i686 release) then tweak to your heart's content. You only need Docker or Podman to build. Sound like you just want to change that pentium4 line.
Thanks for the explanation. Looks like the main reason behind this decision is the performance of CRT math routines, which I think isn't a concern for a significant amount of programs which don't use floating point arithmetic at all.
Nonetheless, the intent of -march
flag is to limit the CPU instruction set used, and there is no easy way to know if the final binary fits the criteria because of precompiled CRT object files. Additionally, the default target instruction set is expected to match the toolchain name, which is the convention of other platforms (i686-linux-gnu
target executables are not compiled with SSE instructions for all of the Linux distros I used). The name of the toolchain (w64devkit-i686) indicates the target instruction set being i686 rather than something with SSE instructions.
I wonder if it is realistic for w64devkit to provide 2 sets of precompiled math libraries, one compiled with "pentium4" instruction set and another compiled for i686, and provides some ways for users to link to either; otherwise it would be clearer if the toolchain is named something like "w64devkit-i686-sse2" instead.
You mentioned Windows NT, but I've believe the runtime libraries (aside from libgcc) do not reliably support it anyway, so you couldn't use them even if they were compiled for older targets.
At least for my single-threaded programs compiled with w64devkit, they work as far back as NT 4.0 (with Intel CPU).
Hi, out of curiosity, what application domain you're supporting with modern toolchain as far back as NT4? @na-na-hi
Indeed, a pentium4
arch might be a better tag but it may cause a lot of inconvenience elsewhere.
Dogmatic i686
has no SSE, as you noted, yet some people think i686
is appropriate for SSE/KNI targets -- which is reasonable for practical purposes. For example: https://archlinux32.org/architecture/
FreeBSD i386 target is actually i686 (not a standalone toolchain but still). Also CMOV debate...
I can only think of Athlon XP as somewhat remotely usable Windows XP / 7 hardware for new applications.
GCC's "x86 Options" documentation says:
This isn't working correctly as expected with w64devkit-i686, using elfx86exts to print the instruction sets used:
This means w64devkit-i686 cannot build binaries targeting CPUs without SSE support, or Windows versions without SSE support (including but not limited to, Windows NT 3.51, or NT 4.0 on a non-Intel x86 CPU). It seems like the culprit is the linker or the precompiled object files linked into the final binary because the object files generated by
gcc -march=i686 -c
do seem to stick to i686 instructions.I am aware that the build tools deliberately require a "pentium4" CPU to run, but this issue concerns the build target. Is it possible to use the generic ix86 instruction sets for precompiled object files? These files don't seem to contain any performance-critical code.