rust-lang / unsafe-code-guidelines

Forum for discussion about what unsafe code can and can't do
https://rust-lang.github.io/unsafe-code-guidelines
Apache License 2.0
665 stars 57 forks source link

Tracking where we rely on LLVM giving more guarantees than C #292

Open RalfJung opened 3 years ago

RalfJung commented 3 years ago

There are some places where we rely on LLVM having less UB than C does. Seems good to have a list of those, and keep a careful eye for what LLVM does in that space since they might not consider Rust when they adjust their rules here.

If you know of anything else, please let me know so I can add it to the list. :)

mcy commented 3 years ago

I'll definitely post here if anything pops into my head, but I think function pointers, esp function pointer equality, is generally a little sketchy even in the absence of things like -Wl,-icf=all.

I know there's a separate issue about uninit-ness of padding bits (e.g. when is the padding in (u8, u16) uninit) which will probably have LLVM-level considerations (does storeing an aggregate to an otherwise initialized alloca make the padding bits undef? I think the answer is no but something to keep an eye out for).

Also, when you say TBAA, you probably also mean at LTO time, right? It may be worth noting that cross-language LTO maybe to be done without strict aliasing. I actually have no idea if TBAA survives into embeded bitcode and if LLVM is entitled to use it across modules.

thomcc commented 3 years ago

These are probably not only relevant to LLVM, but to the new GCC backend (and gcc-rs too).

(Of course, they also apply to cranelift, but it doesn't do much optimizing, so it seems unlikely to cause many problems)

Still, it's probably worth ensuring we have tests as some sort of "canary" that would help indicate if these rules are violated... It's somewhat likely we have them already, admittedly

RalfJung commented 3 years ago

I'll definitely post here if anything pops into my head, but I think function pointers, esp function pointer equality, is generally a little sketchy even in the absence of things like -Wl,-icf=all.

Fn ptrs are mostly sketchy because Rust tells LLVM their address does not matter.

I know there's a separate issue about uninit-ness of padding bits (e.g. when is the padding in (u8, u16) uninit) which will probably have LLVM-level considerations (does storeing an aggregate to an otherwise initialized alloca make the padding bits undef? I think the answer is no but something to keep an eye out for).

Here the main problem is that it is rather unclear what the exact rules in C even are (that's the entire indeterminate / unspecified value / trap representation debacle). So I have no idea if we rely on more than what C guarantees, since C doesn't really say in clear terms what it guarantees.^^

Also, when you say TBAA, you probably also mean at LTO time, right? It may be worth noting that cross-language LTO maybe to be done without strict aliasing. I actually have no idea if TBAA survives into embeded bitcode and if LLVM is entitled to use it across modules.

Yes, I also mean at LTO time. I would assume that LLVM handle situations where some modules have TBAA info and others do not correctly.

Of course, when Rust code calls C code, the C UB rules still apply: a C function taking int* and float* (that actually loads both pointers at their expected type) is UB to call with aliasing pointers from Rust. But that is a different discussion.

These are probably not only relevant to LLVM, but to the new GCC backend (and gcc-rs too).

Ah, you mean because GCC treats TBAA more implicitly? Yes, that could be a problem for the GCC backend. I hope the people building it are aware. :)

thomcc commented 3 years ago

Ah, you mean because GCC treats TBAA more implicitly?

I'm not certain about this. I've heard that it does, but I don't know for sure either way.

I think @antoyo is the person building the GCC backend, and so might plausibly know if this is the case (and would be good to loop in here either way).

antoyo commented 2 years ago

I'm not exactly sure what the question was here, but GCC enables strict-aliasing by default.

RalfJung commented 2 years ago

That means the GCC Rust backends need to find a way to disable strict-aliasing, since otherwise they are unsound.

(And same for the other cases where Rust has less UB than C does, such as wrapping_offset pointer arithmetic, int2ptr casts, pointer comparison. Also there are some subtle questions around zero-sized accesses which do not exist in C.)

antoyo commented 1 year ago

I'm not exactly sure about those because I'm not familiar enough with LLVM, but maybe those 2 are things where LLVM gives more guarantees than C: