Closed ldm0 closed 3 months ago
transmuting an integer to a pointer makes it so that the pointer does not have any provenance, meaning it is not valid to dereference. the PR you referenced was very much intended to break code like this.
error: Undefined Behavior: memory access failed: 0x255a8[noalloc] is a dangling pointer (it has no provenance)
--> src/main.rs:8:35
|
8 | std::hint::black_box(unsafe { *x });
| ^^ memory access failed: 0x255a8[noalloc] is a dangling pointer (it has no provenance)
|
= help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
= help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
= note: BACKTRACE:
= note: inside `foo` at src/main.rs:8:35: 8:37
note: inside `main`
--> src/main.rs:12:5
|
12 | foo(K { b: &x as *const i8 as usize });
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=90c384e2cf026c7431e961fc8cd93888 See https://doc.rust-lang.org/stable/1.79.0/std/ptr/index.html#provenance and the following blog posts for information on provenance:
As far as I can tell, integer-sized-struct-to-pointer conversions are not UB 🤔 : https://doc.rust-lang.org/nightly/std/intrinsics/fn.transmute.html.
On the page you linked it specifically says:
Transmuting integers to pointers is a largely unspecified operation. It is likely not equivalent to an as cast. Doing non-zero-sized memory accesses with a pointer constructed this way is currently considered undefined behavior.
All this also applies when the integer is nested inside an array, tuple, struct, or enum.
@Noratrieb @RossSmyth Thanks for your quick response. But when did that become a UB?
The transmute
's documentation didn't specify that before https://github.com/rust-lang/rust/pull/122379. It's more like a breaking change and should be gated under nightly feature like #![feature(strict_provenance)]
.
as usual, Rusts unsafe semantics have been underspecified for ages, and are slowly getting more clear, which does sadly break code that relied on them being a particular thing before. it's an inevitable consequences of having an underspecified model and tightening that up while keeping semantics that can allow for many optimizations, which are vital to making Rust fast.
while the ptr documentation says it's non-normative and part of the strict provenance experiment, this isn't related to strict provenance. strict provenance is nothing other than a style of writing unsafe code (to never cast integers to pointers) and does not imply any language rules.
the reason why it's undefined behavior to do what you're doing is because otherwise every read of memory would have to be considered a side effect by the optimizer, as it could do an implicit transmutation which would have to expose the provenance as described in Ralf's blog.
The transmute is lawful. Rust says that a pointer does indeed have all the bytes of either a u16, u32, u64 (or possibly u128...?), depending on the platform, and all bits of an integer's bitpattern are valid pointer bytes.
The pointer deref is not. Rust says that dereferencing a pointer that does not point to within an allocation (including an "allocation" in global memory or on the stack) is not defined behavior, and Rust must somehow decide when a pointer points within an object.
@ldm0 Have you also read the std::ptr
documentation?
The precise rules for validity are not determined yet. The guarantees that are provided at this point are very minimal:
- For operations of size zero, every pointer is valid, including the null pointer. The following points are only concerned with non-zero-sized accesses.
- A null pointer is never valid.
- For a pointer to be valid, it is necessary, but not always sufficient, that the pointer be dereferenceable: the memory range of the given size starting at the pointer must all be within the bounds of a single allocated object. Note that in Rust, every (stack-allocated) variable is considered a separate allocated object.
Please consider that we have chosen to accept RFC 3559 as an explanation for why memory is sometimes dereferenceable, and that in days of yore, "undefined behavior" was simply "whatever the language didn't nail down". The refinement into today's "unconstrained behavior" came later.
...and why, anyways, is it the case that an integer-sized struct would differ from an integer here? Especially considering you applied repr(transparent)
to it.
...and why, anyways, is it the case that an integer-sized struct would differ from an integer here? Especially considering you applied repr(transparent) to it.
repr(transparent)
doesn't matter at all. But yeah, as currently the doc states:
Transmuting integers to pointers is a largely unspecified operation. It is likely not equivalent to an as cast. Doing non-zero-> sized memory accesses with a pointer constructed this way is currently considered undefined behavior.
All this also applies when the integer is nested inside an array, tuple, struct, or enum.
I am fine with it, even if it wasn't mentioned anywhere several months ago(https://github.com/rust-lang/rust/issues/128409#issuecomment-2259540549) :-/
The transmute is lawful. Rust says that a pointer does indeed have all the bytes of either a u16, u32, u64 (or possibly u128...?), depending on the platform, and all bits of an integer's bitpattern are valid pointer bytes.
The pointer deref is not. Rust says that dereferencing a pointer that does not point to within an allocation (including an "allocation" in global memory or on the stack) is not defined behavior, and Rust must somehow decide when a pointer points within an object.
I am still confused.
as
casting lawful?Currently, this code works as expected:
pub fn foo(k: usize) {
let x = k as *mut u8;
std::hint::black_box(unsafe { *x });
}
transmute
not lawful.gep ptr null
into unreachable: https://rust.godbolt.org/z/zWTc7ofaa
But if it does someday, many code would break :-/you should read Ralf's linked blog posts, they should contain some of the reasoning behind it all
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=90c384e2cf026c7431e961fc8cd93888 See https://doc.rust-lang.org/stable/1.79.0/std/ptr/index.html#provenance and the following blog posts for information on provenance:
the std::ptr
docs link is broken. this https://doc.rust-lang.org/1.79.0/std/ptr/index.html#provenance (no /stable/
) works, though
2. Thus many code in the wild are UB, including backtrace-rs.
I did a detailed crater analysis in the PR that made this change, and submitted a handful of PRs and issue reports to the affected codebases we could locate. We make changes like this carefully, and we are not ignorant of how Rust is used.
We would not completely break backtrace-rs, because it is part of the standard library. Such a PR would not merge.
Function pointers are different from data pointers, but that's a bit besides the point. I'm sorry we didn't find your project, but we take steps to not randomly break a large fraction of the ecosystem with a codegen change like this.
@ldm0 re: Is dereference after as casting lawful?
- If it's lawful, why is dereference after int-to-ptr transmute not lawful.
- If it's not lawful, how would you expect people write Windows FFI code?
There are two reasons this currently works, and should continue to work, especially if you aren't also doing something that I will briefly say is "sus". You and I both know you shouldn't make up random addresses and dereference them, yes? This isn't about whether or not rustc should compile blatantly unsound code, it is "how do I write some code that should work".
So, the as
operation is
as
itself.as
, it Just Works, there's no caveat about the size of the argument or whatever.Thus this:
let x = gen_usize();
let x = x as *mut Whatever;
unsafe { *x }
is in fact the gesture that invokes "rustc attempts to make something up that would justify your program". The mechanism for this "rustc attempts to justify your program to itself" is unspecified, but it is known that we will try to assign it a provenance. Somehow. A provenance other than the null provenance.
The reason mem::transmute
doesn't do this is because
unsafe
, so you take on all burdens of soundness here.ptr::read
of the argument.Reading a random integer will read bytes without provenance, because integers don't have provenance, because assigning provenance to integers disables various optimizations. For instance, if every instance of "5" has a potential provenance, then we can't simply unify instances of calculating or equating the number "5". Thus we would rather keep all these "bitcast" operations as low-overhead and low-cost to programs that are using unsafe code in a sound way.
As the current maintainer of the backtrace code: yes, that crate has a lot of work to do.
Anyway, no action seems necessary here.
Code sample:
Expected to generate asm like this(for target
aarch64-apple-darwin
):However, the generated asm looks like this:
Godbolt: https://rust.godbolt.org/z/s5PY4TPME
I did some research, and found that Rust emits such LLVM IR for the
transmute
:However,
load (gep ptr null)
is assumed asunreachable
in LLVM'sInstCombine
pass, thusbrk
is generated.The
transmute
was orignally lowered asinttoptr
rather thangetelementptr i8 ptr null..
, which is legal and the result assembly won't crash.I performed a bisection and discovered that the codegen change was introduced byhttps://github.com/rust-lang/rust/pull/121282. I suspect that this PR was intended to address integer-to-pointer conversions, but it inadvertently affects the transmute of integer-sized-struct-to-pointer conversions.
As far as I can tell, integer-sized-struct-to-pointer conversions are not UB 🤔 : https://doc.rust-lang.org/nightly/std/intrinsics/fn.transmute.html.