Open glandium opened 2 years ago
Rust seems to just mark all struct arguments as byval: https://github.com/rust-lang/rust/blob/77e7e88567691068f5fba288618023368882d60b/compiler/rustc_target/src/abi/call/x86.rs#L57
While clang will pass certain aggregates directly: https://github.com/llvm/llvm-project/blob/1aaba40dcbe8fdc93d825d1f4e22edaa3e9aa5b1/clang/lib/CodeGen/TargetInfo.cpp#L1875-L1883
Another instance of @eddyb's favorite bug, https://github.com/rust-lang/rust/issues/65111.
From your description, I don't really see how this used to work before the referenced LLVM change, as the ABIs are completely different -- I guess you just ended up interpreting the pointer address as the value contained in the struct and that happened to "work".
Before the change, the inlining wasn't happening so we ended up with a normal call, and the rust callee is just fine too because it compiled down to the proper ABI for the function. What screws everything is that inlining now happens.
@Gankra did you run the abi checker for some i686 targets?
From your description, I don't really see how this used to work before the referenced LLVM change, as the ABIs are completely different -- I guess you just ended up interpreting the pointer address as the value contained in the struct and that happened to "work".
I believe %X* byval(%X)
and %X
are equivalent when stack-passing happens for the latter (IOW, my understanding is that byval
forces stack-passing and then behaves like ref
on Rust pattern bindings), though the pointer form may avoid some inefficiencies (and will always cause stack-passing).
IIRC x86 32-bit ABIs don't have any registers available for arguments "by default" (and the proliferation of e.g. thiscall
, regcall
, fastcall
etc. is about adding registers at all).
Any ABI that doesn't use byval
will effectively do quasi-SSA stack-passing once it runs out of registers, and so far I don't even know if we copied this from Clang or if rustc
is uniquely inefficient for e.g. large arrays passed by value - the worst case would likely be one where the first few scalar components of the type still use registers, so the value is fragmented (with e.g. 8 registers reserved for argument-passing, I believe [usize; 9]
will put the first 8 elements into one register each and then the 9th one on the stack).
Though this discussion is further complicated by:
byval
make_indirect
, which replaces values with pointers before register-vs-stack decisions
byval
would)extern "Rust"
at the moment)byval
would result in worse IR, e.g. %X** byval(%X*)
when passing X
byval
for stack-passing individual scalars (which appears to be what Clang is doing here to get the simpler i32
)cast_to
, used to make the scalar components register-sized
[u8; 9]
into [u32; 3]
on a platform with 32-bit integers[usize; 9]
and [u8; 9]
(even though if LLVM ever saw [9 x i8]
, it would end up using one byte per register)From a quick check, it seems (older) 32-bit ABIs are most susceptible to this inefficiency: https://github.com/rust-lang/rust/blob/fe217c28ffc6955f0927d8e8715d43d727debe5a/compiler/rustc_target/src/abi/call/arm.rs#L71-L73 https://github.com/rust-lang/rust/blob/fe217c28ffc6955f0927d8e8715d43d727debe5a/compiler/rustc_target/src/abi/call/sparc.rs#L24-L26 https://github.com/rust-lang/rust/blob/fe217c28ffc6955f0927d8e8715d43d727debe5a/compiler/rustc_target/src/abi/call/mips.rs#L24-L26
Even a 64-bit one (I assume because it's still an old ABI): https://github.com/rust-lang/rust/blob/fe217c28ffc6955f0927d8e8715d43d727debe5a/compiler/rustc_target/src/abi/call/powerpc64.rs#L114
MIPS64 is also affected, but only for struct
s, not arrays (which makes me wonder if we're actually currently wrong for "struct
s that contain large arrays").
Given sufficient ABI information, LLVM itself should be able to "relax" byval
but I doubt "explicit -> implicit" strength reductions like that would be uncontroversial.
This is one of my gripes with LLVM and ABI: it forces the frontend to think about ABI distinctions like:
byval
)inreg
, swift{error,context}
, etc.)... but only some of the time, not all of the time, and that leads to both multiple ways to represent certain behaviors (like the one discussed here, with byval
vs immediate-passing), and also being forced to use convoluted conversions (like what cast_to
causes) in other cases (so LLVM optimizations like inlining have to work extra hard to remove the RustType -> [usize; N] -> RustType
roundtrip).
I recently wrote something related to this (Ctrl+F "ABI mapping" in https://github.com/rust-lang/rust/issues/100698#issuecomment-1232448683), and I still believe it would be one of the nicer solutions overall: decouple "stack slots and SSA scalars/vectors" call dataflow (friendlier to IPO in general - e.g. inlining, but not only) from exporting a function with specific register/stack assignments.
I believe
%X* byval(%X)
and%X
are equivalent when stack-passing happens for the latter (IOW, my understanding is thatbyval
forces stack-passing and then behaves likeref
on Rust pattern bindings), though the pointer form may avoid some inefficiencies (and will always cause stack-passing).
Thanks, this is the bit I was missing! The ABIs are incompatible at the LLVM IR level, but become compatible post-legalization.
That does limit the issue to the cross-language LTO case only.
Given sufficient ABI information, LLVM itself should be able to "relax" byval but I doubt "explicit -> implicit" strength reductions like that would be uncontroversial.
LLVM will generally try to relax byval(%T)
into individual SROAd %T
arguments as part of argument promotion -- but this is only for functions with internal linkage, not for the case which you're talking about (where I doubt such a transform would be legal, because it makes the call UB at the IR level due to ABI mismatch).
The rest
Yes, LLVM's ABI handling is a big pile of poo ... your suggestion does sound reasonable on the surface, but I'm not volunteering to implement it :P
Regardless of the specific IR representation, something that would help a lot is to extract Clang's ABI calculation into something reusable by other frontends (discussed a bit at https://discourse.llvm.org/t/rfc-targetinfo-library/64342, though I doubt something will come of that). While rustc couldn't use that directly (due to other codegen backends), at least it could be used to our computed ABI is correct.
something that would help a lot is to extract Clang's ABI calculation into something reusable by other frontends
Heh, something like that is why rustc_target
avoids dependencies on the rest of the compiler, and is generic over the concept of a semantic type and contexts that can provide more information about those semantic types (tho recent changes motivated by performance have been compromising a bit of it, you still only need an arena to use the facilities AFAICT).
My long-term hopes with rustc_target::abi::call
was to split "classifying an input/output type" from "assign registers/stack to an input/output" (the latter could be done using declarative information in "classifications"), but I never got around to spend more time on it.
(the rough idea is that once you reduce a type and its layout to the set of possible ABI encodings that apply, it's a "resource calculus" from there on, like the "stateful" part of these algorithms ~always looks like "number of remaining registers per bank" or "stack offset/misalignment", and all the decisions are like "do I fit", i.e. "are there available resources for me to consume")
But for rustc_target
to be really good at this, we'd have to go all the way to exact register/stack mappings (which e.g. the Cranelift backend could maybe use directly) and then "reverse engineer" that back into LLVM IR (and if LLVM is willing to expose its own decisions back to us, we could check our own understanding - right now AFAIK this would only be possible by abusing assembly or DWARF, and that's not great).
WG-prioritization assigning priority (Zulip discussion).
@rustbot label -I-prioritize +P-high regression-untriaged
@Gankra did you run the abi checker for some i686 targets?
I have not, it's on my TODO list to figure out how platforms-that-aren't-host-but-can-run-on-host should be handled by abi-checker, since that's a lot easier than the full cross-compile case but also I'm not sure what problems come up if you build an x86_32 dylib and load it into an x64 process on the various platforms (esp when passing opaque pointers across the dylib boundary).
edit: actually maybe this would be fine if I just make you rebuild the harness itself too...
More broadly, I'm not sure abi-cafe would be able to catch this since the ultimate ABI is correct, it's just that llvm freaks out after cross-lang LTO. Maybe this is something abi-cafe should support checking? I'm not sure how hard it is to setup.
llvm has had other bugs like this before too, right? istr something wonky about inlining functions compiled for different "modes" like simd or thumb?
But for rustc_target to be really good at this, we'd have to go all the way to exact register/stack mappings (which e.g. the Cranelift backend could maybe use directly)
Cranelift doesn't allow assigning exact registers. Cranelift is pretty much exactly on the same level as what rustc_target exposes. You tell it the calling convention (for assigning registers and stack locations) and a list of input and output AbiParam
where each AbiParam
has a primitive type (i8, i16, i32, i64, i128, f32, f64 or vector types), argument extension (none, zero extension, sign extension) and argument purpose (normal, struct argument of given size, struct return, and a couple of ones only meant for use by wasm runtimes). Rustc_target tells exactly how to perform the decomposition from rust values to Cranelift arguments and return values. Unlike LLVM it doesn't have struct types, so struct arguments just specify size (i believe it will also need to specify alignment in the future) and struct returns are performed by returning multiple values rather than a single struct value.
@Gankra cargo run --target i686-unknown-linux-gnu
works just fine on an x86_64 host and compiles the tests for i686. I get the following results:
Final Results:
[...]
ptr::c::rustc_calls_cc failed
ptr::c::cc_calls_rustc failed
[...]
ui128::c::rustc_calls_cc fixed (test was busted, congrats!)
ui128::c::cc_calls_rustc fixed (test was busted, congrats!)
ui128::c::cc_calls_cc failed
[...]
231 tests run - 44 passed, 0 busted, 5 failed, 182 skipped
Error: TestsFailed
The errors I get are:
compiling ptr::c::rustc_calls_cc
running: "rustc" "--crate-type" "staticlib" "--out-dir" "target/temp/" "--target" "i686-unknown-linux-gnu" "-Cmetadata=ptr_c_rustc_caller" "generated_impls/rustc/ptr_c_rustc_caller.rs"
Failed to build test: rust compile error
warning: associated function `new` is never used
--> generated_impls/rustc/ptr_c_rustc_caller.rs:33:8
|
33 | fn new(val: i128) -> Self {
| ^^^
|
= note: `#[warn(dead_code)]` on by default
warning: associated function `new` is never used
--> generated_impls/rustc/ptr_c_rustc_caller.rs:42:8
|
42 | fn new(val: u128) -> Self {
| ^^^
error: literal out of range for `usize`
--> generated_impls/rustc/ptr_c_rustc_caller.rs:718:29
|
718 | let arg0: *mut () = 0x706050403020100 as *mut ();
| ^^^^^^^^^^^^^^^^^
|
= note: the literal `0x706050403020100` (decimal `506097522914230528`) does not fit into the type `usize` and will become `50462976usize`
= note: `#[deny(overflowing_literals)]` on by default
[...]
and
compiling sysv_i128_emulation::handwritten::rustc_calls_cc
running: "rustc" "--crate-type" "staticlib" "--out-dir" "target/temp/" "--target" "i686-unknown-linux-gnu" "-Cmetadata=sysv_i128_emulation_handwritten_rustc_caller" "handwritten_impls/rustc
/sysv_i128_emulation_handwritten_rustc_caller.rs"
running: "x86_64-linux-gnu-gcc" "-O0" "-ffunction-sections" "-fdata-sections" "-fPIC" "-m32" "-march=i686" "-Wall" "-Wextra" "-o" "target/temp/handwritten_impls/cc/sysv_i128_emulation_handw
ritten_cc_callee.o" "-c" "handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c"
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:25: error: expected declaration specifiers or ‘...’ before ‘__int128’ 36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128); | ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:35: error: expected declaration specifiers or ‘...’ before ‘__int128’ 36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128); | ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:52: error: expected declaration specifiers or ‘...’ before ‘__int128’ 36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128); | ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:36:71: error: expected declaration specifiers or ‘...’ before ‘__int128’
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning=
cargo:warning= 36 | typedef void (*functy1)(__int128, __int128, float, __int128, uint8_t, __int128); | ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:39:27: error: expected declaration specifiers or ‘...’ before ‘__int128’ 39 | void callee_native_layout(__int128* arg0, __int128* arg1, __int128* arg2) { | ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:39:43: error: expected declaration specifiers or ‘...’ before ‘__int128’ 39 | void callee_native_layout(__int128* arg0, __int128* arg1, __int128* arg2) { | ^~~~~~~~handwritten_impls/cc/sysv_i128_emulation_handwritten_cc_callee.c:39:59: error: expected declaration specifiers or ‘...’ before ‘__int128’ 39 | void callee_native_layout(__int128* arg0, __int128* arg1, __int128* arg2) {
[...]
Ok so I should just figure out a way to optionally flip on LTO for clang_calls_rustc
, nice!
Discussed during T-compiler meeting (notes), namely this part:
we have an ABI that is incompatible with Clang's C ABI but LLVM is inlining our code together which violates the ABI constraints leading to the unsoundness. So one fix here is to get LLVM to not do that another is to change our ABI to match Clang's in some or all cases. There's also a possibility of teaching LLVM to do the inlining but in some way that respects ABI requirements?
@rustbot label -i-compiler-nominated
I looked a bit closer into what clang is actually doing here. I referenced the wrong code above, the relevant bit is actually this: https://github.com/llvm/llvm-project/blob/1aaba40dcbe8fdc93d825d1f4e22edaa3e9aa5b1/clang/lib/CodeGen/TargetInfo.cpp#L1891-L1900
In words, what clang is going is to unpack structs <= 128 bits where all parts are 32-bit or 64-bit primitives (pointer, integer, float, enum). Some samples: https://clang.godbolt.org/z/bMjYfxnjj
This is being done on the premise that it does not change the final ABI, while making it easier to optimize the resulting IR. There's also this special gem hidden in there: https://github.com/llvm/llvm-project/blob/1aaba40dcbe8fdc93d825d1f4e22edaa3e9aa5b1/clang/lib/CodeGen/TargetInfo.cpp#L1487-L1492
That is, if it's C++ code it actually makes a difference whether you use struct
or class
to declare the structure -- the former will be unpacked, while the latter will be passed byval. Unless you're on Windows.
And another extra peculiar case: If we combine this with the fastcall calling convention, we get this: https://clang.godbolt.org/z/MW797ehfj Now we have an additional inreg i32 undef argument on the call, before the i32 that actually passes the unpacked struct.
Discussed during T-compiler P-high review.. There are a number of short-comings identified here. We should figure out how to best invest our effort to address them, even partially.
I think the next step is to write an MCP proposing some kind of ABI validation (see https://github.com/rust-lang/rust/issues/65111), perhaps via abi-cafe
I thought that in C, structs are never passed in a register (they are not implicit transparent like our structs)? How come clang passes this as a plain i32
rather than a struct with an i32
field?
Looking at the clang code
// Don't do this for the MCU if there are still free integer registers
// (see X86_64 ABI for full explanation).
So... they get away with that because they are sure this i32
won't be passed in a register, and for stack passing i32
vs { i32 }
makes no difference? And ironically, if your architecture has more registers for argument passing then clang will actually get worse at optimizing your code because it can't do this "optimization" any more?
I see no practical way for us to fix this alone. Clearly clang itself only considers itself to be beholden to the final ABI of the produced binary, and feels free to arbitrarily change the LLVM-IR-level ABI as long as the asm-level ABI stays the same. I doubt they make stability guarantees for the LLVM-IR-level ABI, so trying to match them seems hopeless.
So IMO a systematic fix requires xlang LTO to be more careful, and in particular xlang inlining needs to be able to actually handle these kinds of ABI mismatches.
i32
and { i32 }
are handled identically by LLVM afaik. A struct is splatted into it's individual fields when determining where to pass arguments. Even something like { i32, i32, i32, i32, i32 }
is passed in 5 registers, despite the x86_64 SystemV call conv stating that the equivalent C struct should be passed on the stack. Instead the frontend is required to lower it to ptr byval(<some LLVM type with the same size as the C struct>) align <align of the C struct> %arg
. Rustc wouldn't even pass { i32, i32, i32, i32, i32 }
as the type for the byval
, instead it passes [20 x i8]
which has the same size.
I thought that in C, structs are never passed in a register (they are not implicit transparent like our structs)?
Small structs are often passed in registers, but it depends on the calling convention. The x86_64 SystemV call conv for example passes many structs less than 2 registers in size in registers. While the wasm C abi (the official one, not the broken one wasm32-unknown-unknown uses) passes every struct which contains only a single scalar as field by-value.
After https://github.com/llvm/llvm-project/commit/6c8adc505471542be38bd71d1000062daa46d7bc, inlining in cross-language LTO happens in cases where it didn't happen before, including cases where things go very bad (more details in https://bugzilla.mozilla.org/show_bug.cgi?id=1789779#c7)
It seems to boil down to LLVM not liking that rust defines its
extern "C"
functions in significantly different ways than clang does for the C/C++ code that calls it. For example:The caller:
Rust defines the function as:
while the caller C code does this:
The equivalent C code:
defines the function as:
(Ironically, rustc transforms a non-
extern "C"
version of the function to the same declaration as clang's)Arguably, there's an underlying LLVM bug not being able to handle this case, which /could/ be considered fine, but I'm not sure it's supposed to be.
It's worth noting that rustc does not use a byval for e.g. x86_64-unknown-linux-gnu.
Cc: @nikic