The `extern "C"` ABI of x86 vector types depends on target features

RalfJung commented 9 months ago

The following program has UB, for very surprising reasons:

use std::mem::transmute;
#[cfg(target_arch = "x86")]
use std::arch::x86::*;
#[cfg(target_arch = "x86_64")]
use std::arch::x86_64::*;

extern "C" fn no_target_feature(_dummy: f32, x: __m256) {
    let val = unsafe { transmute::<_, [u32; 8]>(x) };
    dbg!(val);
}

#[target_feature(enable = "avx")]
unsafe fn with_target_feature(x: __m256) {
  // Here we're seemingly just calling a safe function... but actually this call is UB!
  // Caller and callee do not agree on the ABI; specifically they disagree on how the
  // `__m256` argument gets passed.
  no_target_feature(0.0, x);
}

fn main() {
    assert!(is_x86_feature_detected!("avx"));
    // SAFETY: we checked that the `avx` feature is present.
    unsafe {
        with_target_feature(transmute([1; 8]));
    }
}

The reason is that the ABI of the __m256 type depends on the set of target features, so the caller (with_target_feature) and callee (no_target_feature) do not agree on how the argument should be passed. The result is a vector half-filled with junk. (The same issue also arises in the other direction, where the caller has fewer features than the callee: example here.)

I'm not sure if this is currently even properly documented? We are mentioning it in https://github.com/rust-lang/rust/pull/115476 but the program above doesn't involve any function pointers so we really cannot expect people to be aware of that part of the docs. We show a general warning about the type not being FFI-compatible, but that warning shows up a lot and anyway in this case both caller and callee are Rust functions!

I think we need to do better here, but backwards compatibility might make that hard. @chorman0773 suggested we should just reject functions like no_target_feature that take an AVX type by-val without having declared the AVX feature. That seems reasonable; a crater run would be needed to assess whether it breaks too much code. An alternative might be to have a deny-by-default lint that very clearly explains what is happening.

There are also details to work out wrt what exactly the lint should check. Newtypes around __m256 will have the same problem. What about other larger types that contain __m256? Behind a ptr indirection it's obviously fine, but what about (__m256, __m256)? If we apply the ScalarPair optimization this will be passed in registers even on x86.

A possible place to put the check could be somewhere around here.

In terms of process, I am not sure if an RFC is required; a t-compiler MCP might be sufficient. Currently we accept code that clearly doesn't do what it looks like it should do.

And finally -- are there any other targets (besides x86 and x86-64) that have target-features that affect the ABI? They should get the same treatment.

Note that this is different from https://github.com/rust-lang/rust/issues/116344 in two ways:

This issue only affects non-"Rust" ABIs; the other issue affects even the "Rust" ABI. (The Rust ABI passes __m256 by-pointer exactly to work-around this issue.)
This issue comes about when the user enables extra features (which is possible also locally via #[target_feature]), the other issue comes about when the user disables features (which is only possible on a per-crate level via -C, and if you're mixing crates with different -C flags then you're already already on very shaky grounds -- we do that with std but nobody else really gets to do that, I think).

Cc @workingjubilee more ABI fun ;)

chorman0773 commented 9 months ago

For a hard error, I think it probably should go through RFC. I can start drafting that now. As for what it would check, I'd expect a hard error on any time a simd type is passed by value and the appropriate feature is unavailable, even if the specific ABI wouldn't necessarily pass it in registers, including inside a structure, and the lint should probably check the same thing.

RalfJung commented 9 months ago

That can have potentially surprising effects, e.g. when a closure captures something of type __m256 and the closure gets passed by-value to a generic function.

It also means adding adding a private field that contains by-val repr(simd) data can be a semver-breaking change. Cc @ehuss

chorman0773 commented 9 months ago

It also means adding adding a private field that contains by-val repr(simd) data can be a semver-breaking change

This would already be the case with replacing several fields with a SIMD type. For example, changing struct Foo(u64,u64,u64,u64); to struct Foo(__m256i);

Although thinking about it, because features can be per-crate, this might be a problem on crates that specialize based on target features.

Perhaps #[repr(Rust)] types should be exempted and always passed indirectly when they contain SIMD types (but not #[repr(C)] types).

coastalwhite commented 9 months ago

With RISC-V, you have Zfinx and Zdinx which have hard-floats but have moved the floats to the normal general purpose registers. In this case, forbidding the passing of floats by-value, if the f (RISC-V's version of single-precision float) target is not present, is very restrictive and probably not what you would want. Otherwise, allowing the passing-by-value of floats between f and Zfinx is also UB.

RalfJung commented 9 months ago

But the ABI only changes when f is present or not present, I assume? When f is present, Zfinx just means more instructions can be used but the ABI still says f32 is passed on float registers? And when f is not present, softfloat and Zfinx use the same ABI?

So that sounds like whether or not f is present should be a different target triple. Code with vs without f simply cannot be linked together. But having or not having Zfinx doesn't make an ABI difference. Or am I misunderstanding?

coastalwhite commented 9 months ago

No, that is exactly it. Do I understand correctly that you essentially want to do something similar to thumb with the eabi and eabihf?

RalfJung commented 9 months ago

I'm not an expert on target details. :see_no_evil: Usually my purview ends when I have given precise abstract semantics to nasty pieces of unsafe Rust and MIR.^^ But every now and then terrible low-level CPU nightmares creep onto my territory and then I have to learn how to deal with them...

So, I don't know what ARM does with eabi vs eabihf. But it sounds like they are considering "hard float" and "soft float" two different ABIs that presumably use different registers for argument passing (so they are indeed different ABIs), and then have corresponding target triples to tell apart the ABIs. So yeah that sounds like what I am suggesting we should do for other targets as well. (We could also consider different ABI strings, like "C" vs "C-softfloat" or so. No idea if that makes any sense.)

Obviously we don't actually care whether the functions behind those ABIs use x87/d/f hardware instructions or do some softfloat stuff, we just care how arguments are passed. But sadly it seems that in LLVM, there's no way to say "you have the d,f target features available but please use the softfloat ABI anyway"? If that is something people need (enable d,f or x87,sse,sse2 on softfloat targets without affecting ABI) we should start talking with LLVM people about whether and how that could be achieved.

This may all be completely naive since I'm just coming at this with my principled approach of making everything safe or at least sound and giving it semantics at a high level; I rely on people like @chorman0773 to tell me when my ideas make no sense "in the field". ;)

Skgland commented 9 months ago

If with_target_feature calling no_target_feature is UB due do different target features, shouldn't main calling with_target_feature have the same problem? just the other way around? Or is this specific to "C" being a non Rust-ABI?

RalfJung commented 9 months ago

Yeah for the Rust ABI we do apply a work-around (it always passes such types in memory), so that call is fine. But the combination of extern fn and target_feature is a big minefield currently.

(I updated the example to actually use the Rust ABI for with_target_feature.)

Skgland commented 9 months ago

Would adding a inline(never) Rust ABI intermediate function be a valid workaround to prevent UB here? Unconditional code generation #[target_feature]: Note 1 making #[inline(never)] required as otherwise the compiler may inline the function and then we would be back to the incorrect feature set.

RalfJung commented 9 months ago

Inlining should be aware of target features, so inline(never) should not be needed. If it is needed that's a critical bug in the inliner that we should track and fix.

Yes such an intermediate Rust ABI function works around the problem successfully. But the fundamental problem remains with function pointers: we have to decide whether an extern "C" fn(__m256) has the AVX ABI or some fallback ABI, without having any clue where and how the function it points to was declared. We probably want it to have the AVX ABI since people that use AVX and interface between Rust and C code likely will have their C code built in a way that it uses the AVX ABI. This means all extern "C" fn declared in Rust that take __m256 as arguments should do this the AVX style, no matter the set of enabled target features -- or they should fail to build.

RalfJung commented 9 months ago

Ah, the inline actually does make a difference here... well that's bad, I opened https://github.com/rust-lang/rust/issues/116573 to track that.

briansmith commented 9 months ago

The compiler does report improper_ctypes_definitions warnings for the given code so I guess it isn't unexpected? It would e better to have an example that doesn't trigger improper_ctypes_definitions.

workingjubilee commented 9 months ago

The compiler does report improper_ctypes_definitions warnings for the given code so I guess it isn't unexpected? It would e better to have an example that doesn't trigger improper_ctypes_definitions.

And that's supposed to be just a warning. We should hard error if we know everything is fucked and there's no salvaging the compilation (whether hard erroring is appropriate here, I don't know yet).

RalfJung commented 9 months ago

There's no C code involved here so I would argue it is reasonable to silence the lint for the example. The lint talks about FFI, but the example does no FFI, this is all Rust-to-Rust calls.

briansmith commented 9 months ago

Is there an example that can trigger this with only FFI-safe types though?

chorman0773 commented 9 months ago

I've actually suppressed that lint for __m128 on sse3 only code (so xmm registers 100% exist) that was experiencing a perf regression when the function used default ABI. But also I wouldn't trust improper_ctypes{,_definitions} to not get blanket suppressed in FFI heavy code, so at the very least a lint for this should be it's own thing. improper_ctypes is a clippy lint in rustc.

tmandry commented 9 months ago

I don't think #[target_feature] should ever affect the ABI of function pointers called from within them, especially not extern "C", ones where the C ABI is defined statically at the TU level.

If we did anything with ABI based on #[target_feature], it would have to be encoded in the type of the function pointers somehow. So for example, an extern "C" function with an avx calling convention would really be extern "C+avx" or similar.

RalfJung commented 9 months ago

@tmandry yeah agreed. Sadly target_feature 1.0 was stabilized in a form where #[target_feature] does affect ABI, and core::arch SIMD types were stabilized in a form where -C target-feature affects ABI. That's causing this issue here and also https://github.com/rust-lang/rust/issues/116344.

RalfJung commented 9 months ago

@briansmith our ctypes lints seem to have enough gaps to be easily circumvented; here's a reproducer that the lint doesn't catch. It's using a closure to pass the value -- closures that capture a single field are newtypes around that field so they get the field's ABI, but the improper-ctypes lint ignores closures.

workingjubilee commented 9 months ago

More to the point, there's no formal definition of "FFI-safe" and "FFI-unsafe" in the Rust language. As far as I can tell, the actual thing that bears the obligation to be ABI-correct is when you do

extern "C" {
    fn c_func(arg: SomeType) -> ReturnType;
}

fn rust_func() -> ReturnType {
    let arg = SomeType::random();
    // here, the Rust caller actually discharges the obligation...
    // which is honestly slightly unfortunate:
    // in real scenarios, we might not be the author of the `extern "C"` block!
    unsafe { c_func(arg) }
}

RalfJung commented 9 months ago

Do we know if clang is doing anything clever here? I assume C has exactly the same problem, since their function types don't track target features either.

chorman0773 commented 9 months ago

clang does not fix this issue. See llvm-project#64706

RalfJung commented 9 months ago

That's a different issue (mismatch with GCC). This here is about mismatch within code compiled by the same compiler, but with different locally set target features.

chorman0773 commented 9 months ago

Ah right. It does, in fact, mismatch with itself without the attribute.

chorman0773 commented 9 months ago

(It also mismatches between -mavx and -mno-avx/__attribute__((target("avx"))))

RalfJung commented 9 months ago

Do you happen to have a godbolt link demonstrating that? :)

chorman0773 commented 9 months ago

https://godbolt.org/z/obhezje4n

Interpreting the assembly: In the global (-mavx) case, it passes __m256 in ymm registers, and returns them in ymm0. For the inactive case, it passes on the stack and returns in xmm0:xmm1. For the locally-enabled only case, it passes on the stack, and returns in ymm0 (according to Sys-V, there shouldn't be a difference with the globally enabled case).

Edit: I've added gcc's output for context. GCC has the correct output according to the ABI. Edit: For other x86 assembler readers... yes gcc is using an integer simd load and an floating-point simd store. Yes it is correct. Yes it is mildly infuriating.

RalfJung commented 8 months ago

Turns out that clang is able to detect and warn at least some of these cases. Clang also allows putting __attribute__((target("avx"))) annotations at function declarations and uses them to decide whether or not to warn/error. In contrast, we allow #[target_feature] only on definitions, not on imports (extern "C" {} blocks).

One way to resolve this problem would be a new post-monomorphization check: traverse the argument (and return) types of the function being built (if it's a non-Rust ABI), as well as the argument and return types of every non-Rust-ABI function signature we call. If we find any repr(simd) types in there that's not behind a pointer indirection, ensure that vectors of that size are supported by the available target features. I don't know if there's any good cross-target way to do this; if not, we'll need a big match on the target architecture and hard-code which features need to be enabled for which vector size.

This is stricter than what clang does, but it's also safer, so if we can get away with it I think this would be my preferred approach. We don't have a good central place for post-mono checks, so we'll need to figure out what is the best way to share this between the codegen backends and Miri. Also, such a check would likely be optimization-dependent, since if an ABI-incompatible function call gets optimized away we wouldn't see it post-mono. We have to do the check post-mono though since we support generic extern "C" fn and there is no trait for "does not contain a SIMD type by-value" (and it would be a breaking change to start requiring such a trait).

(I think this is what @chorman0773 said they are planning to do in their compiler.)

chorman0773 commented 8 months ago

Yep (in my case, right down at the codegen level - codegen-x86 would reject the function when finding parameter locations). I actually have an RFC-being-written for prescribing this behaviour at the rust level.

RalfJung commented 8 months ago

I actually have an RFC-being-written for prescribing this behaviour at the rust level.

Great!

Given that we currently do rather nonsensical things at the ABI level, I was going to suggest we should just implement a band-aid without waiting for an RFC. But that doesn't preclude designing something more proper. :)

chorman0773 commented 8 months ago

One note is this text that I put

When a function is called via an fn-item type (not a function pointer type), the target features of the callee are considered. [...] Implementations of Rust would be required to ensure this call is performed properly (with parameters passed/returned in the appropriate registers), which may entail the use of a shim function or alternative code generation techniques.

This may require some extra work with llvm, or on the rustc side.

RalfJung commented 8 months ago

Interesting, you seem to plan to accept more code than I would. Anyway we can discuss the RFC once it is ready. :)

chorman0773 commented 8 months ago

(Actually, I could probably open it in a less formal capacity - I think there is some pre-discussion to be had before filing a full RFC)

RalfJung commented 8 months ago

There's now a pre-RFC by @chorman0773 that proposes a way to resolve this problem.

rust-lang / rust

The `extern "C"` ABI of x86 vector types depends on target features #116558