Open hanna-kruppe opened 7 months ago
WG-prioritization assigning priority (Zulip discussion).
@rustbot label -I-prioritize +P-medium
I've run into this problem again, but this time it's a C callback that's returning the union. As I don't control it I can't change its signature to manually match the ABI. I think I'll have to move more processing code into my C support library.
A possible workaround might be to use a technically incorrect function signature on the Rust side that is lowered to the correct ABI on wasm targets by current (and future) rustc - pretending the callbacks get an out-pointer instead of returning by value / receive arguments by reference instead of by value. Pretty ugly and wasm-specific, but maybe less ugly than additional C code that is pointless on non-wasm platforms?
@hanna-kruppe Hmm, that might work, but it would mean I'd need to add platform conditional code in a number of places, vs a small C shim which works everywhere.
Edit: I'm having trouble even getting a shim working. It works fine when compiled to x86_64, but I can't get it to work in wasm - it's like the DispatchRock is having its value modified.
@hanna-kruppe On the chance that you have any time to take a look, I pushed my broken code to https://github.com/curiousdannii/emglken/tree/remglk_rs_broken_unions
Hmm, just found a potential solution: add an dummy union variant on the rust side that is 64 bits. I think that means it won't be considered a singleton union, but C just ignores the extra word. I might be able to remove all my manual shimming this way?
Yep, that seems to work perfectly! And when this bug eventually gets fixed, all I'll need to do is remove the dummy variant. Much cleaner. :)
That's pretty risky. There may be targets where it causes Rust and C to disagree on whether the union should be passed/returned in memory. More importantly, both sides will now disagree on how large the type is on targets with 32 bit pointers. Like other ABI mismatches, that may not cause breakage immediately will fail in very spectacular ways sooner or later. For example, when Rust returns the union to C via an out-pointer, either because the source code is written like that or because that's the ABI lowering for returning by value, Rust may write a full eight bytes while the C side only reserved four bytes. Adjacent data in the stack or elsewhere will then be clobbered. In simple cases, returning the union may be compiled down to storing four bytes because it's obvious to the optimizer that only a four-byte field is initialized (that's why I had to add -Zmir-opt-level=0 to the linked example). But you shouldn't rely on that because you'll constantly be one refactoring or compiler update away a very "fun" debugging session.
Here's another example that exhibits the problem even when compiled with optimizations, simplified from code in the linked commit. Because the union value is taken loaded from memory, not constructed in-place, it's always returned by copying eight bytes, so I think you already have the bug I predicted in my last comment.
If I also added the dummy variant to the C union that would negate the risk, right? Or do you think the compiler is smart enough to optimise that away? It wouldn't optimise it away on both sides?
I could also make my library write to the dummy variant if that would help it prevent being optimised away.
Both C and Rust will follow the type layout and ABI rules for the type you've written down (modulo bugs such as this one and assuming repr(C)
on the Rust definition). Problems happen because the type definitions, and hence the layout / ABI, differs. Compiler optimizations are not really relevant, they only affect how and when the problems manifest. If you also change the C side to have it work with the same union type as Rust, with the extra variant, that ABI mismatch indeed disappears. I guess in your particular case that's quite feasible, since you're in control of all the relevant C code including the header file in question.
Some notes from looking into this today (updating my atrophied knowledge of rustc's layout/ABI internals along the way):
struct Empty {}
and empty arrays are ignored for the purpose of this rule.TyAndLayout::homogeneous_aggregate
is defined, this applies if all non-ZST union fields ultimately boil down to the same Reg
, e.g.:
Reg
, that's why Dannii ran into this with union { u32, *const _ }
.Option<&T>
is "the same" as usize
.union { u32, u8 }
is not considered singleton even though a by-value u8
argument would be promoted to u32
on wasm.repr(C)
, repr(int)
, and repr(C, int)
.repr(C)
enums, at least in simple cases, because Clang also ignores some ZST union members.repr(int)
enums, as well as repr(C, int)
, there is an ABI mismatch vs. the C equivalent. The the layout of such enums is union { Variant1, Variant2, ... }
where each variant is a struct with at a field for the discriminant. Clang doesn't consider that a singleton union, even if the structs are otherwise identical for ABI purposes, but rustc will happily boil down each variant to its discriminant if there's no other fields or the other fields are 1ZSTs.All of this makes me think there's probably a chance for a quick and dirty fix by just special casing unions somehow. At least, if nobody cares to delve further into the details of how Clang handles empty structs and arrays in all cases. I'm not itching to write a patch, though, at least not until #119183 is settled.
On wasm32-unknown-emscripten and wasm32-wasi, rustc implements the C ABI for some unions incorrectly, i.e., different from Clang. Minimized example:
I expected to see this happen: the resulting wasm code should pass and return the union indirectly, i.e. by pointers, as described in the C ABI document and implemented in Clang (compiler explorer).
Instead, this happened: the union is passed and returned as a single scalar (i32). See the previous compiler explorer link, and I also see it locally for wasm32-wasi (too lazy to install a whole emscripten toolchain):
The definition of "singleton" union in the C ABI document ("recursively contains just a single scalar value") may be considered ambiguous, but clearly Clang interprets it differently from rustc, so something will have to give. I have not tried to exhaustively explore in which cases they differ, the above example may not be the only one.
Compare and contrast https://github.com/rust-lang/rust/issues/71871 - as discussed there, the emscripten and wasi targets have long since been fixed to match Clang's ABI, with only wasm32-unknown-unknown lagging behind. However, it seems that the fixed C ABI on emscripten and wasi targets is still incorrect in some cases around unions.
cc @curiousdannii, who encountered this in a real project (https://github.com/rust-lang/cc-rs/issues/954)
Meta
rustc +nightly --version --verbose
:(Also happens on 1.76 stable.)