Open eggrobin opened 3 years ago
just curious: is this relevant only for Linux & macOS? Or is there something here that might also be useful as an alternative to FMA3 on Windows 10 machines with older chips? For users like me who are stuck (for a few more years) with Ivy-Bridge processors that support AVX but not FMA3/AVX2:
Because of different alignment requirements and VEX encoding/VZEROUPPER woes, and because it would be so low-level and pervasive (we cannot gate every operation on R3Element behind a CPUID test) we need a separate DLL for that, with an intermediate layer that chooses which one to use based on CPUID.
We could generate the intermediate layer from journal.proto, and LoadLibrary/GetProcAddress etc.
This blocks FMA usage on Linux & macOS, see https://github.com/mockingbirdnest/Principia/pull/3010#issuecomment-851077178.