rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.39k stars 12.59k forks source link

Clang vs wasm32-{emscripten,wasi} rustc C ABI mismatch w.r.t. "singleton" unions #121408

Open hanna-kruppe opened 7 months ago

hanna-kruppe commented 7 months ago

On wasm32-unknown-emscripten and wasm32-wasi, rustc implements the C ABI for some unions incorrectly, i.e., different from Clang. Minimized example:

#[repr(C)]
pub union U {
    a: u32,
    b: u32,
}

#[no_mangle]
pub extern "C" fn unwrap_union(u: U) -> u32 {
    unsafe { u.a }
}

#[no_mangle]
pub extern "C" fn make_union() -> U {
    U { a: 0 }
}

I expected to see this happen: the resulting wasm code should pass and return the union indirectly, i.e. by pointers, as described in the C ABI document and implemented in Clang (compiler explorer).

Instead, this happened: the union is passed and returned as a single scalar (i32). See the previous compiler explorer link, and I also see it locally for wasm32-wasi (too lazy to install a whole emscripten toolchain):

```wat $ rustc +nightly -O cabi-union.rs --crate-type=cdylib --target wasm32-wasi $ wasm-tools print cabi_union.wasm (module $cabi_union.wasm (type (;0;) (func (param i32) (result i32))) (type (;1;) (func (result i32))) (type (;2;) (func)) (func $unwrap_union (;0;) (type 0) (param i32) (result i32) local.get 0 ) (func $make_union (;1;) (type 1) (result i32) i32.const 0 ) (func $dummy (;2;) (type 2)) (func $__wasm_call_dtors (;3;) (type 2) call $dummy call $dummy ) (func $unwrap_union.command_export (;4;) (type 0) (param i32) (result i32) local.get 0 call $unwrap_union call $__wasm_call_dtors ) (func $make_union.command_export (;5;) (type 1) (result i32) call $make_union call $__wasm_call_dtors ) (table (;0;) 1 1 funcref) (memory (;0;) 16) (global $__stack_pointer (;0;) (mut i32) i32.const 1048576) (export "memory" (memory 0)) (export "unwrap_union" (func $unwrap_union.command_export)) (export "make_union" (func $make_union.command_export)) (@producers (language "Rust" "") (processed-by "rustc" "1.78.0-nightly (2bf78d12d 2024-02-18)") (processed-by "clang" "16.0.4 (https://github.com/llvm/llvm-project ae42196bc493ffe877a7e3dff8be32035dea4d07)") ) ) ```

The definition of "singleton" union in the C ABI document ("recursively contains just a single scalar value") may be considered ambiguous, but clearly Clang interprets it differently from rustc, so something will have to give. I have not tried to exhaustively explore in which cases they differ, the above example may not be the only one.

Compare and contrast https://github.com/rust-lang/rust/issues/71871 - as discussed there, the emscripten and wasi targets have long since been fixed to match Clang's ABI, with only wasm32-unknown-unknown lagging behind. However, it seems that the fixed C ABI on emscripten and wasi targets is still incorrect in some cases around unions.

cc @curiousdannii, who encountered this in a real project (https://github.com/rust-lang/cc-rs/issues/954)

Meta

rustc +nightly --version --verbose:

rustc 1.78.0-nightly (2bf78d12d 2024-02-18)
binary: rustc
commit-hash: 2bf78d12d33ae02d10010309a0d85dd04e7cff72
commit-date: 2024-02-18
host: x86_64-unknown-linux-gnu
release: 1.78.0-nightly
LLVM version: 18.1.0

(Also happens on 1.76 stable.)

apiraino commented 7 months ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

curiousdannii commented 7 months ago

I've run into this problem again, but this time it's a C callback that's returning the union. As I don't control it I can't change its signature to manually match the ABI. I think I'll have to move more processing code into my C support library.

hanna-kruppe commented 7 months ago

A possible workaround might be to use a technically incorrect function signature on the Rust side that is lowered to the correct ABI on wasm targets by current (and future) rustc - pretending the callbacks get an out-pointer instead of returning by value / receive arguments by reference instead of by value. Pretty ugly and wasm-specific, but maybe less ugly than additional C code that is pointless on non-wasm platforms?

curiousdannii commented 7 months ago

@hanna-kruppe Hmm, that might work, but it would mean I'd need to add platform conditional code in a number of places, vs a small C shim which works everywhere.

Edit: I'm having trouble even getting a shim working. It works fine when compiled to x86_64, but I can't get it to work in wasm - it's like the DispatchRock is having its value modified.

@hanna-kruppe On the chance that you have any time to take a look, I pushed my broken code to https://github.com/curiousdannii/emglken/tree/remglk_rs_broken_unions

This builds and runs glulxe ``` ./src/build.sh ./bin/emglken.js tests/glulxercise.ulx ``` Then enter anything. You should see something like this (60664 turns into 60684 and then into 54940) ``` gidispatch_call_array_register 0xef0e0 &+#!Cn 60664 0xed1c retain_array 0xef0e0 0xed1c 977032 print_disprock 60684 unretain_array 0xef0e0 977032 gidispatch_call_array_unregister 0xef0e0 &+#!Cn 54940 Glulxe fatal error: Mismatched array reference in Glk call. ``` Where as in x86_64 it works (280594112 is retained through it all): ``` gidispatch_call_array_register 0x55f910b98d60 &+#!Cn 280594112 0x7ffeb52fd1d0 retain_array 0x55f910b98d60 0x7ffeb52fd1d0 94528215811776 print_disprock 280594112 unretain_array 0x55f910b98d60 94528215811776 gidispatch_call_array_unregister 0x55f910b98d60 &+#!Cn 280594112 ```
curiousdannii commented 7 months ago

Hmm, just found a potential solution: add an dummy union variant on the rust side that is 64 bits. I think that means it won't be considered a singleton union, but C just ignores the extra word. I might be able to remove all my manual shimming this way?

Yep, that seems to work perfectly! And when this bug eventually gets fixed, all I'll need to do is remove the dummy variant. Much cleaner. :)

hanna-kruppe commented 7 months ago

That's pretty risky. There may be targets where it causes Rust and C to disagree on whether the union should be passed/returned in memory. More importantly, both sides will now disagree on how large the type is on targets with 32 bit pointers. Like other ABI mismatches, that may not cause breakage immediately will fail in very spectacular ways sooner or later. For example, when Rust returns the union to C via an out-pointer, either because the source code is written like that or because that's the ABI lowering for returning by value, Rust may write a full eight bytes while the C side only reserved four bytes. Adjacent data in the stack or elsewhere will then be clobbered. In simple cases, returning the union may be compiled down to storing four bytes because it's obvious to the optimizer that only a four-byte field is initialized (that's why I had to add -Zmir-opt-level=0 to the linked example). But you shouldn't rely on that because you'll constantly be one refactoring or compiler update away a very "fun" debugging session.

hanna-kruppe commented 7 months ago

Here's another example that exhibits the problem even when compiled with optimizations, simplified from code in the linked commit. Because the union value is taken loaded from memory, not constructed in-place, it's always returned by copying eight bytes, so I think you already have the bug I predicted in my last comment.

curiousdannii commented 7 months ago

If I also added the dummy variant to the C union that would negate the risk, right? Or do you think the compiler is smart enough to optimise that away? It wouldn't optimise it away on both sides?

I could also make my library write to the dummy variant if that would help it prevent being optimised away.

hanna-kruppe commented 7 months ago

Both C and Rust will follow the type layout and ABI rules for the type you've written down (modulo bugs such as this one and assuming repr(C) on the Rust definition). Problems happen because the type definitions, and hence the layout / ABI, differs. Compiler optimizations are not really relevant, they only affect how and when the problems manifest. If you also change the C side to have it work with the same union type as Rust, with the extra variant, that ABI mismatch indeed disappears. I guess in your particular case that's quite feasible, since you're in control of all the relevant C code including the header file in question.

hanna-kruppe commented 6 months ago

Some notes from looking into this today (updating my atrophied knowledge of rustc's layout/ABI internals along the way):

All of this makes me think there's probably a chance for a quick and dirty fix by just special casing unions somehow. At least, if nobody cares to delve further into the details of how Clang handles empty structs and arrays in all cases. I'm not itching to write a patch, though, at least not until #119183 is settled.