Open llvmbot opened 3 years ago
In addition to the scenarios above, below is another problem case that we observe. This time many xmm registers are preserved while they should not be.
void test(void)
{
__asm
{
VZEROUPPER
}
}
Compiled with
clang-cl /O2 /FA -c test.cpp
produces
#APP
vzeroupper
#NO_APP
ret
which seems ok. However, compiled with
clang-cl -mavx2 /O2 /FA -c test.cpp
produces
sub rsp, 168
vmovaps xmmword ptr [rsp + 144], xmm15 # 16-byte Spill
vmovaps xmmword ptr [rsp + 128], xmm14 # 16-byte Spill
vmovaps xmmword ptr [rsp + 112], xmm13 # 16-byte Spill
vmovaps xmmword ptr [rsp + 96], xmm12 # 16-byte Spill
vmovaps xmmword ptr [rsp + 80], xmm11 # 16-byte Spill
vmovaps xmmword ptr [rsp + 64], xmm10 # 16-byte Spill
vmovaps xmmword ptr [rsp + 48], xmm9 # 16-byte Spill
vmovaps xmmword ptr [rsp + 32], xmm8 # 16-byte Spill
vmovaps xmmword ptr [rsp + 16], xmm7 # 16-byte Spill
vmovaps xmmword ptr [rsp], xmm6 # 16-byte Spill
#APP
vzeroupper
#NO_APP
vmovaps xmm6, xmmword ptr [rsp] # 16-byte Reload
vmovaps xmm7, xmmword ptr [rsp + 16] # 16-byte Reload
vmovaps xmm8, xmmword ptr [rsp + 32] # 16-byte Reload
vmovaps xmm9, xmmword ptr [rsp + 48] # 16-byte Reload
vmovaps xmm10, xmmword ptr [rsp + 64] # 16-byte Reload
vmovaps xmm11, xmmword ptr [rsp + 80] # 16-byte Reload
vmovaps xmm12, xmmword ptr [rsp + 96] # 16-byte Reload
vmovaps xmm13, xmmword ptr [rsp + 112] # 16-byte Reload
vmovaps xmm14, xmmword ptr [rsp + 128] # 16-byte Reload
vmovaps xmm15, xmmword ptr [rsp + 144] # 16-byte Reload
add rsp, 168
vzeroupper
ret
As none of the xmm6-xmm15 are touched by the inline assembly code (VZEROUPPER), it makes no sense to preserve them. Also, additional VZEROUPPER added by the compiler is excessive.
@llvm/issue-subscribers-backend-x86
Author: None (llvmbot)
Extended Description
The issue is first observed with clang 10.0 bundled with MS Visual Studio 2019 on windows, but later confirmed with clang 7.0.1 on Linux (CentOS 7.7) and with clang 12.0 bundled with Xcode 12.2 on Mac OS.
Here is the minimal reproducible example:
When compiled on windows with
it produces the following assembly (meta-information skipped for clarity)
As you can see XMM6 is not preserved even though it is clobbered by vpxor instruction.
If I pass the -mavx2 flag to the compiler, however
the produced assembly turns into
XMM6 is now preserved.
The same issue is present on Linux and Mac OS. However ms_abi must be explicitly stated now:
Compiling on Linux with
produces
Compiling with
produces
Compiling on Mac OS with
produces
Compiling with
produces
Additional comments and observations.