Closed jwong101 closed 1 month ago
This is a bug in const eval and miri. ptr.add(off)
is always supposed to behave exactly like ptr.offset(off as isize)
.
regression in https://github.com/rust-lang/rust/pull/128482 cc @RalfJung
We are just computing off * size_of<T>()
here without casting to isize
first, so I suspect that to be the issue: https://github.com/rust-lang/rust/blob/6f7229c4da0471f1470bb1f86071848cba3a23d9/compiler/rustc_const_eval/src/interpret/operator.rs#L306-L316
@rustbot label -needs-triage -A-docs +C-bug +T-compiler +A-const-eval +regression-from-stable-to-beta
This is a bug in const eval and miri.
ptr.add(off)
is always supposed to behave exactly likeptr.offset(off as isize)
.
Surely that is not the case? We want to emit getelementptr inbounds nuw
for ptr.add()
, making use of the fact that the index is non-negative. We added this attribute in LLVM pretty much specifically so ptr.add()
can use it. We could obviously add yet another method to encode this, but given that we already have a variant that exists specifically to add an unsigned offset, I would hope that nobody wants to go there.
To me the current wording "The computed offset, count * size_of::
cc @scottmcm @RalfJung
My interpretation of "computation must not overflow an isize" is "do the multiplication on mathematical values; the result must fit in the value range of an isize".
But you are right that this contradicts the part about "convenience for offset(x as isize)".
Regarding the Miri behavior, it seems the intrinsic at some point got changed to directly accept a usize https://github.com/rust-lang/rust/pull/110837 but opsem/miri teams did not get informed so the logic in Miri does not make sense for this way of using the intrinsic.
https://github.com/rust-lang/rust/pull/130229 should fix this. Thanks for catching this, @jwong101 !
byte_add(!0)
multiplies usize::MAX * 1
(not -1 * 1
!), the result is usize::MAX
, and that does not fit in an isize
, so this is always UB because it "overflows an isize".
In core::ptr::add, is the resulting offset treated as a signed integer or unsigned integer for the following requirement:
Since the resulting offset must fit in an isize
, this should not make a difference. (I would say it is treated as a mathematical integer.)
I have a clarification question about this example:
#[no_mangle]
pub const fn mul() -> i32 {
const {
let x = &[0i32; 2];
let x = x.as_ptr().wrapping_add(1);
unsafe {
// has UB according to MIRI and fails compilation on nightly if you comment this out
// mul nuw i64 -1, 4 is poison
// x.add(!0).read();
// DB and passes compilation
// mul nsw i64 -1, 4 is defined
x.offset(-1).read();
}
x[0]
}
}
Where does the mul nuw
come from? As far as I can tell by looking at our codegen backend, our inbounds_gep
does not care whether the RHS is signed or unsigned. So under current codegen, add(x)
truly is equivalent to offset(x as isize)
, right? Just Miri and const-eval do not treat it equivalently? And we want codegen to treat it not equivalently, but that has not happened yet?
So under current codegen,
add(x)
truly is equivalent tooffset(x as isize)
, right? Just Miri and const-eval do not treat it equivalently?
That's also what I figured by looking at the source code and the reason why I initially thought that Miri/CTFE treating them differently was just a bug.
It was definitely an accident -- Miri did multiplication and overflow check at "whatever the type of the count
is", so when usize
counts were allowed, those used usize
multiplication (and then the result got implicitly cast into isize
). That code was only ever tested for isize
counts, but it happened to do "something" for usize
that doesn't explode immediately, but ended up with somewhat odd UB checks.
With https://github.com/rust-lang/rust/pull/130239, the unsigned intrinsic is now properly supported by Miri (the only change required was to make the final cast to isize
be UB if it does not preserve the mathematical value); the docs update is happening in https://github.com/rust-lang/rust/pull/130229.
Where does the mul nuw come from? As far as I can tell by looking at our codegen backend, our inbounds_gep does not care whether the RHS is signed or unsigned
Sorry for the confusion. I was just using mul nuw
to showcase how Miri + the constant interpreter were reporting unsigned multiplication overflow as UB.
So under current codegen, add(x) truly is equivalent to offset(x as isize), right? Just Miri and const-eval do not treat it equivalently?
Yeah, we just use inbounds
atm, which only requires mul nsw
and that the addition of the signed offset doesn't wrap the address space.
And we want codegen to treat it not equivalently, but that has not happened yet?
Yeah, I was curious on if it was legal for us to use the new nuw
attribute, which was why I checked the docs + what Miri did.
That code was only ever tested for isize counts, but it happened to do "something" for usize that doesn't explode immediately, but ended up with somewhat odd UB checks.
Yeah, I was initially confused when I saw that add
was allowed to go backwards, because I could've sworn that I saw Miri reporting that as UB awhile ago.
Though now that I think about it, I was probably just using ptr::add
at the start of an object, which is UB out of the "inbounds of an allocated object" requirement.
Location
core::ptr::add https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L810-L854
core::ptr::offset https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L349-L393
Summary
The documentation for
core::ptr::add
currently states that it:https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L810
Except it's not a convenience for just doing
.offset(count as isize)
, both according to Miri and the constant interpreter:So one might infer that the following requirement:
https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L819
means that the
count * size_of::<T>()
must not overflow in the unsigned sense foradd
and signed sense foroffset
.However, it's unclear if
overflowing isize
means the unsigned multiplication result has to be positive, or if the unsigned multiplication also has to not wrap in the signed sense. Both Miri and the constant interpreter allow the following:However,
x.add(offset)
will also get lowered togetelementptr inbounds i32, %x, i64 %offset
, which has UB according to the LLVM langref.If the
core::ptr::add
does have defined behavior in the above case, then we can't directly usegetelementptr inbounds
for types larger than a byte, without doing amul nuw
beforehand, and then doinggetelementptr inbounds i8
. However, if it's UB, then the docs should make it clear.So
core::ptr::add
still carries the responsibility of preventing signed overflow in the multiplication(at least according to our codegen semantics). However, it's unclear if "must not overflow isize", means that the unsigned result has to fit in[0, isize::MAX]
, as the following passes:To summarize:
count * size_of::<T>()
need to not overflow in both the signed and unsigned sense forcore::ptr::add
?core::ptr::add
, is the resulting offset treated as a signed integer or unsigned integer for the following requirement: https://github.com/rust-lang/rust/blob/33855f80d4393bff5d1226eabf8e61f348179cee/library/core/src/ptr/const_ptr.rs#L821-L824Rationale for inquiry:
LLVM recently added a
nuw
attribute to thegetelementptr
instruction in order to inform the optimizer that the offset is non-negative. I was checking thecore::ptr::add
documentation to see if we could use it for that, however, I found some contradictory information w.r.t the docs and the current behavior of the constant interpreter and Miri.