skeeto / w64devkit

Portable C and C++ Development Kit for x64 (and x86) Windows
The Unlicense
2.68k stars 185 forks source link

-march=i686 produces binaries containing SSE instructions #88

Open na-na-hi opened 9 months ago

na-na-hi commented 9 months ago

GCC's "x86 Options" documentation says:

‘i686’ When used with -march, the Pentium Pro instruction set is used, so the code runs on all i686 family chips..

This isn't working correctly as expected with w64devkit-i686, using elfx86exts to print the instruction sets used:

$ echo 'void main(){}' > test.c
$ gcc -march=i686 test.c -o test.exe

$ ./elfx86exts.exe test.exe
NOT64BITMODE (ret)
CMOV (cmovns)
SSE2 (movsd)
CPU Generation: Unknown

This means w64devkit-i686 cannot build binaries targeting CPUs without SSE support, or Windows versions without SSE support (including but not limited to, Windows NT 3.51, or NT 4.0 on a non-Intel x86 CPU). It seems like the culprit is the linker or the precompiled object files linked into the final binary because the object files generated by gcc -march=i686 -c do seem to stick to i686 instructions.

I am aware that the build tools deliberately require a "pentium4" CPU to run, but this issue concerns the build target. Is it possible to use the generic ix86 instruction sets for precompiled object files? These files don't seem to contain any performance-critical code.

skeeto commented 9 months ago

It seems like the culprit is […] the precompiled object files linked into the final binary

You guessed correctly: The runtime contains SSE2 instructions, including mingw32, mingwex, libgcc, and libstdc++. If you -march=i686 and do not link these runtime objects, then your binary will not have SSE2. The kit includes a substantial program that can be built this way in case you wanted another test (src/pkg-config.c). Compiled with -march=i386, it even works on ancient machines running Windows NT 3.51.

My reason for doing this is that, at least for GCC-generated code, SSE2 is substantially faster than x87 — orders of magnitude faster — especially in runtime math routines. It's night and day. That's why I even thought to do it. I wondered why some of my 32-bit builds were so slow and -march=native didn't help.

SSE2 hit the market nearly 23 years ago, predating Windows XP, so it seems like a bad trade-off to leave this performance on the table for everyone running hardware younger than 20 years old just to support some special cases. For the same reason you can't turn it off in the runtime, had I not done it this way then they couldn't turn it on in some performance critical parts of their program.

You mentioned Windows NT, but I've believe the runtime libraries (aside from libgcc) do not reliably support it anyway, so you couldn't use them even if they were compiled for older targets. I patch Mingw-w64 to support as far back as Windows XP.

The good news is that if you have a special case, it's easy to build a custom w64devkit for it. Apply variant-i686.patch (or start from an i686 release) then tweak to your heart's content. You only need Docker or Podman to build. Sound like you just want to change that pentium4 line.

na-na-hi commented 9 months ago

Thanks for the explanation. Looks like the main reason behind this decision is the performance of CRT math routines, which I think isn't a concern for a significant amount of programs which don't use floating point arithmetic at all.

Nonetheless, the intent of -march flag is to limit the CPU instruction set used, and there is no easy way to know if the final binary fits the criteria because of precompiled CRT object files. Additionally, the default target instruction set is expected to match the toolchain name, which is the convention of other platforms (i686-linux-gnu target executables are not compiled with SSE instructions for all of the Linux distros I used). The name of the toolchain (w64devkit-i686) indicates the target instruction set being i686 rather than something with SSE instructions.

I wonder if it is realistic for w64devkit to provide 2 sets of precompiled math libraries, one compiled with "pentium4" instruction set and another compiled for i686, and provides some ways for users to link to either; otherwise it would be clearer if the toolchain is named something like "w64devkit-i686-sse2" instead.

You mentioned Windows NT, but I've believe the runtime libraries (aside from libgcc) do not reliably support it anyway, so you couldn't use them even if they were compiled for older targets.

At least for my single-threaded programs compiled with w64devkit, they work as far back as NT 4.0 (with Intel CPU).

arkadijs commented 1 month ago

Hi, out of curiosity, what application domain you're supporting with modern toolchain as far back as NT4? @na-na-hi

Indeed, a pentium4 arch might be a better tag but it may cause a lot of inconvenience elsewhere.

Dogmatic i686 has no SSE, as you noted, yet some people think i686 is appropriate for SSE/KNI targets -- which is reasonable for practical purposes. For example: https://archlinux32.org/architecture/ FreeBSD i386 target is actually i686 (not a standalone toolchain but still). Also CMOV debate...

I can only think of Athlon XP as somewhat remotely usable Windows XP / 7 hardware for new applications.