Open jstarks opened 1 week ago
Thank you for pointing this out!
It feels dubious to call it a "breaking change" to apply this diff:
/// Always returns true.
pub fn returns_true() -> bool {
+ true
- false
}
But regardless of whether it is or isn't, I agree that clobber_abi
should definitely do what it says it does, because it is very important that unsafe
code uphold its contracts exactly. We should definitely fix this soon, considering that we're about to see x86 architectures with 32 GPRs!
It looks like this was known by the author (@Amanieu):
InlineAsmClobberAbi::X86_64Win => clobbered_regs! {
X86 X86InlineAsmReg {
// rdi and rsi are callee-saved on windows
ax, cx, dx, r8, r9, r10, r11,
// xmm6-xmm15 are callee-saved on windows, but we need to
// mark them as clobbered anyways because the upper portions
// of ymm6-ymm15 are volatile.
xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7,
xmm8, xmm9, xmm10, xmm11, xmm12, xmm13, xmm14, xmm15,
zmm16, zmm17, zmm18, zmm19, zmm20, zmm21, zmm22, zmm23,
zmm24, zmm25, zmm26, zmm27, zmm28, zmm29, zmm30, zmm31,
k0, k1, k2, k3, k4, k5, k6, k7,
mm0, mm1, mm2, mm3, mm4, mm5, mm6, mm7,
st0, st1, st2, st3, st4, st5, st6, st7,
tmm0, tmm1, tmm2, tmm3, tmm4, tmm5, tmm6, tmm7,
}
},
Oh, right!
I don't think we have a reasonable way to mark "only half a register" as clobbered!
Does LLVM have a way to express these partial clobbers in its assembly language?
And if you look just below, you'll see we have the same issue on AArch64.
AArch64 AArch64InlineAsmReg {
x0, x1, x2, x3, x4, x5, x6, x7,
x8, x9, x10, x11, x12, x13, x14, x15,
x16, x17, x18, x30,
// Technically the low 64 bits of v8-v15 are preserved, but
// we have no way of expressing this using clobbers.
v0, v1, v2, v3, v4, v5, v6, v7,
v8, v9, v10, v11, v12, v13, v14, v15,
v16, v17, v18, v19, v20, v21, v22, v23,
v24, v25, v26, v27, v28, v29, v30, v31,
p0, p1, p2, p3, p4, p5, p6, p7,
p8, p9, p10, p11, p12, p13, p14, p15,
ffr,
}
This is unfortunately a limitation of LLVM: it doesn't give us a way to indicate only half of a register is clobbered.
Yeah, ideally there would be a special "vzeroupper" clobber but there isn't one, it seems.
So it seems this is not a Windows-specific issue! It also is less of a concern for the x86 extension I mentioned, since that doesn't change the definition of those registers: rather, they only exist after not-existing!
It looks like the Windows x64 version of this bug is already reported in https://github.com/llvm/llvm-project/issues/50566
Regardless of if/when LLVM gets fixed to handle partial clobbers, the documentation could still be updated to only guarantee that the upper portions of y/zmm6-15
are marked as clobbered by clobber_abi("C")
. The current implementation would still be valid, as the compiler is free to save and restore more registers (or parts of registers) than the asm!
author clobbered.
On x86_64-pc-windows-msvc, the system ABI considers
xmm0-5
as volatile/caller-save registers, andxmm6-15
as non-volatile/callee-save registers. However,clobber_abi("C")
incorrectly marksxmm6-15
as clobbered, which produces a bunch of wasteful register saves/restores.This behaves according to the documentation (which specifies that xmm6-15 will be clobbered), but it's almost certainly not what the developer wanted due to the unnecessary clobbering causing performance problems. The workaround is to avoid using
clobber_abi
on Windows and to manually clobber registers; this is a pain and may cause problems down the line as more volatile registers are added to the ABI.So a breaking change seems warranted here.
[Godbolt link](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,selection:(endColumn:24,endLineNumber:5,positionColumn:24,positionLineNumber:5,selectionStartColumn:24,selectionStartLineNumber:5,startColumn:24,startLineNumber:5),source:'%23%5Bno_mangle%5D%0Apub+fn+foo()+%7B%0A++++unsafe+%7B%0A++++++++std::arch::asm!!+%7B%0A++++++++++++%22call+bar%22,%0A++++++++++++clobber_abi(%22C%22),%0A++++++++%7D%0A++++%7D%0A%7D%0A'),l:'5',n:'1',o:'Rust+source+%231',t:'0')),k:33.333333333333336,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1790,filters:(b:'1',binary:'1',binaryObject:'1',commentOnly:'1',debugCalls:'1',demangle:'0',directives:'1',execute:'1',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'--target+x86_64-pc-windows-msvc+-C+opt-level%3D3',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+rustc+1.79.0+(Editor+%231)',t:'0')),k:33.333333333333336,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x64+msvc+v19.latest',editorid:1,fontScale:14,fontUsePx:'0',j:1,wrap:'1'),l:'5',n:'0',o:'Output+of+rustc+1.79.0+(Compiler+%231)',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)
Asm:
I expected that
foo
would just establish a frame and callbar
without saving/restoring of the XMM registers.Meta
rustc --version --verbose
:But this repros in nightly.