microsoft / STL

MSVC's implementation of the C++ Standard Library.
Other
10.05k stars 1.48k forks source link

Support for CPUs which do not have SSE2 extensions? #3118

Closed strega-nil-ms closed 2 months ago

strega-nil-ms commented 1 year ago

Currently, on x86, we support /arch:IA32, and build our separately compiled sources with SSE2 support disabled. Is this still necessary, or can we allow ourselves to assume SSE2 hardware?

Notes:

barcharcraz commented 1 year ago

Given /arch:IA32 is the default on x86 we probably have to support that, however we may be able to get away with building the DLLs with SSE2

Alcaro commented 1 year ago

No, /arch:IA32 is not the default on x86. Proof: This code gives different results on IA32 vs no flags. (It also proves that IA32 automatically promotes every float32 to float64 before doing any math.) https://godbolt.org/z/Pv7Go5Te8

The relevant 2018 update is https://support.microsoft.com/en-us/topic/may-8-2018-kb4103718-monthly-rollup-c4c01989-faca-af5f-46f4-2bdc2d0171fd.

AlexGuteniev commented 1 year ago

Might be an issue for 32-bit kernel mode usage.

barcharcraz commented 1 year ago

No, /arch:IA32 is not the default on x86. Proof: This code gives different results on IA32 vs no flags. (It also proves that IA32 automatically promotes every float32 to float64 before doing any math.) https://godbolt.org/z/Pv7Go5Te8

The relevant 2018 update is https://support.microsoft.com/en-us/topic/may-8-2018-kb4103718-monthly-rollup-c4c01989-faca-af5f-46f4-2bdc2d0171fd.

You're right, although the floating-point difference is that /arch:IA32 uses x87 floating point instructions, which are 80-bit

CaseyCarter commented 1 year ago

From https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?view=msvc-170:

/arch:SSE2 Enables the use of SSE2 instructions. This option is the default instruction set on x86 platforms if no /arch option is specified.

StephanTLavavej commented 1 year ago

We talked about this at the weekly maintainer meeting - although the potentially affected set of users is extremely small, if installing an updated redist caused code to fail at runtime, that would be very severe. In general, we have very little code affected by /arch:IA32 / the availability of SSE2 (from a quick scan, it's Special Math, vectorized algorithms, and the __vectorcall calling convention), so the benefits of making such a general change would be relatively small (e.g. in comparison to dropping Vista support which allowed us to remove a massive amount of code and significant runtime logic for Win7+ users).

However, Special Math is a special case - that is implemented in a separate "satellite DLL", and @strega-nil-ms has found that the availability of SSE2 impacts its precision (and presumably its performance). @CaseyCarter noted that we could change just the Special Math satellite DLL to use SSE2, which would be an extremely safe change - only programs actually using Special Math would be affected, as it is a pure leaf of the STL, and this satellite DLL was added relatively recently (VS 2017) so it is extraordinarily unlikely that machines with ancient processors are running code that uses this.

Note: such a change would need to happen in both the GitHub/CMake and internal/MSBuild build systems.

AlexGuteniev commented 1 year ago

Does building Special Math with /fp:strict or /fp:precise fix the precision issue?

strega-nil-ms commented 1 year ago

@AlexGuteniev no, we already build with /fp:strict, and that doesn't really have anything to do with why the result is different on non-SSE2 chips. The implementation of the special math functions does quite a bit of logic, and that logic is (necessarily) different on machines without SSE2, and on machines with SSE2.