Open JakobDegen opened 1 year ago
Godbolt indicates that this is EarlyCSE in LLVM optimizing the pointer comparison to false despite the two stack allocations have non-overlapping lifetime ranges.
This seems related to pointer provenance.
The bug itself or the cause of the bug? I don't know what the incorrect reasoning that LLVM is using here is, so I can't comment on the cause of the issue. But both Rust and LLVM clearly define pointer comparison to be address based, which means that I don't think we need to think about provenance in order to conclude that this is a miscompilation
This seems related to pointer provenance.
tl;dr even if the docs were changed to take provenance into account when comparing for equality, there would still be a miscompilation, because evaluating the same expression twice leads to different results!
fn main() {
let a: *const u8;
let b: *const u8;
{
let v: [u8; 16] = [core::hint::black_box(0); 16];
a = &(v[0]);
}
{
let v: [u8; 16] = [core::hint::black_box(0); 16];
b = &(v[0]);
}
println!("{a:?} == {b:?} evaluates to {}", a==b);
println!("{a:?} == {b:?} evaluates to {}", a==b);
}
0x7ffd8828d818 == 0x7ffd8828d818 evaluates to false
0x7ffd8828d818 == 0x7ffd8828d818 evaluates to true
This was exactly the train of thought that led me to experiment, thereby uncovering the bug: “Can two pointers compare as equal, even when they belong to different allocations?” I experimented, figured out that they cannot, and tried to refactor my println!
s, whereupon the program started behaving differently. (In the second linked program, uncommenting line 18 changes the output of line 19.)
PS: In case it matters, my experiments were on the stable channel.
tl;dr even if the docs were changed to take provenance into account when comparing for equality, there would still be a miscompilation, because evaluating the same expression twice leads to different results!
In theory they could define pointer comparison to be non-deterministic, and then this would be okay. For a comparison that involves provenance, that is probably the case.
But to my knowledge that's not how LLVM intends their ptr comparison to work.
That last example is really strange though, why would it optimize only one of them? Also I was unable to reproduce this without printing, which is equally strange...
Also I was unable to reproduce this without printing, which is equally strange...
I thought that was queer, so I experimented further.
What I found is as follows:
As long as a pointer is never printed using println!
, the behaviour remains stable. a==b
always evaluates to false
, and even if a
and b
are cast to integers and compared, the comparison still returns false. In other words, the code (continued from above)
let (a_, b_) = (a as u128, b as u128);
dbg!(a_ == b_, a_ ^ b_);
outputs
a_ == b_ = false
a_ ^ b_ = 0
to stderr
, which –to the best of my knowledge– is not exactly how integers are supposed to work.
As soon as either pointer gets printed using println!
, a==b
starts evaluating to true
. Casting them both to u128
before printing them does not cause that; only eg println!("{a:?}")
does.
I tried some alternatives to println!
, it looks like black_box(format_args!("{:?}", a))
is sufficient for it to start evaluating to true
:
// Produces true on following println
black_box(format_args!("{:?}", a));
//black_box(format_args!("{:?}", b));
//format!("{:?}", a);
// Doesn't
//format_args!("{:?}", black_box(a));
//black_box(format_args!("{:?}", black_box(a)));
//format!("{:?}", black_box(a));
The wild thing is that black_box(a)
doesn't cause this... so format_args is an even blacker box than black_box, or so? black_box is supposed to be a wildcard, and for all LLVM knows it might be calling format_args...
In fact looks like adding black_box actually enables the optimization?!?
Reduced to black_box(&a)
(edit: here is a playground link)
The wild thing is that black_box(a) doesn't cause this... so format_args is an even blacker box than black_box, or so? black_box is supposed to be a wildcard, and for all LLVM knows it might be calling format_args...
Yeah, this isn't all too surprising, the difference between black_box(a)
and black_box(&a)
is quite significant
Result from my bisection points to a rollup. Out of that rollup likely #102232, I guess?
searched nightlies: from nightly-2022-01-01 to nightly-2023-02-14 regressed nightly: nightly-2022-09-29 searched commit range: https://github.com/rust-lang/rust/compare/470e518c4b43265020c882bcf3c86632f5494910...ce7f0f1aa0f02c45cad0749e63f3086234b1f422 regressed commit: https://github.com/rust-lang/rust/commit/837bf370de144a682041e68bb67469b9f68a55ce
Also - If I'm correct - this should tag more t-libs than t-compiler
@rustbot label -t-compiler +t-libs +t-libs-api
Is it note-worthy that the issue can trigger without using black_box
at all?
Yeah, this isn't all too surprising, the difference between black_box(a) and black_box(&a) is quite significant
How's that not surprising?^^ In terms of tings like "exposing the pointer", both should be equivalent...
Is it note-worthy that the issue can trigger without using black_box at all?
In fact even the original examples work without black-box, e.g.
fn main() {
let a: *const u8;
let b: *const u8;
{
let v: [u8; 16] = [0; 16];
a = &(v[0]);
}
{
let v: [u8; 16] = [0; 16];
b = &(v[0]);
}
println!("{a:?} == {b:?} evaluates to {}", a==b);
println!("{a:?} == {b:?} evaluates to {}", a==b);
}
Result from my bisection points to a rollup. Out of that rollup likely https://github.com/rust-lang/rust/pull/102232, I guess?
That just stabilizes black_box without changing its behavior. Presumably you need to add a feature flag to continue bisecting this code further into the past?
Definitely looks like a t-compiler issue to me.
Somewhat simpler, still demonstrating misbehaviour, again no black_box
. Likewise with usize rather than *const
types.
Edit: updated even simpler.
I whipped up a quick example for the Compiler Explorer, and tried it with the 10 newest and the 10 oldest stable versions of the compiler. All of them appear to fold the comparison to xor eax eax
, ie all of them arbitrarily decide it's false.
What's even more staggering is that this behaviour persists even if we use strict provenance and expose_addr
. I'd been hoping that this would fix the issue, but no.
Here's an example without the prints in between, using a bunch of black boxes:
https://godbolt.org/z/cKMra38a8
It appears that copying the value from either integer causes llvm to "realize" that they're actually equal. Before that it assumes they're not equal.
Edit: actually only the one black box is needed (let c = black_box(a)
) the others aren't necessary to reproduce
Hmm you are all right here, the black_box
just confused me. And as @DoubleHyphen points out it's not even a regression, I can reproduce with any version of rustc
.
@rustbot label T-compiler -T-libs -T-libs-api P-high
I can see it in C++ too with clang trunk, -O1
(https://c.godbolt.org/z/EjvKTqqax)
#include <stdio.h>
#include <cstdint>
int main(void) {
int *a;
int *b;
{
int v[2] = {0, 0};
a = &(v[0]);
}
{
int v[2] = {0, 0};
b = &(v[0]);
}
printf("%p ==/^ %p is %d, %ld\n", a, b, a == b, ((uintptr_t)a) ^ ((uintptr_t)b));
return 0;
}
outputs:
0x7fffc8ddd080 ==/^ 0x7fffc8ddd080 is 0, 0
One the two zeros at the end is not correct
That C++ program has undefined behavior: you're using pointers after the end of their lifetime. Even if only for a comparison, any use of an invalid pointer's value results in UB. So the weird output is "correct" for C++.
But Rust is supposed to have undefined behavior only if an invalid pointer is dereferenced, not on other kinds of uses of an invalid pointer. And once you add in the cast to integer, I think it's a miscompilation even for C++.
That C++ program has undefined behavior: you're using pointers after the end of their lifetime.
I should have known my meager C++ knowledge was wrong 😅
Anyway, adding the casts earlier where v
is still live still produces the same wrong result: https://c.godbolt.org/z/Tvx3KW3Ex
#include <stdio.h>
#include <cstdint>
int main(void) {
uintptr_t a;
uintptr_t b;
{
int v[2] = {0, 0};
a = (uintptr_t)&(v[0]);
}
{
int v[2] = {0, 0};
b = (uintptr_t)&(v[0]);
}
printf("%ld ==/^ %ld is %d, %ld\n", a, b, a == b, a ^ b);
return 0;
}
140734883788880 ==/^ 140734883788880 is 0, 0
(I may still be wrong, I'm writing C and compiling it as C++ from half remembered knowledge years out of date)
How's that not surprising?^^ In terms of tings like "exposing the pointer", both should be equivalent...
Oh this is interesting, I just checked this on godbolt but was not aware of it before... I had expected the black_box
intrinsic to primarily act like a function call and still pass its arguments via their usual ABI, instead of also exposing their current address...
I noticed that the implementation of black_box
is different for ZSTs. Sure enough, ZST pointers behave differently here (always true, even without black_box
).
Maybe I'm used to c/c++'s behaviour. But Is this really a bug? Do we have any promise that two different allocations have the same address?
@wsy2220 it's not that they must have the same address. It's that the comparison of their addresses results in two different results (true
AND false
) at consecutive points in the program.
Ah now I see it. Thanks.
To me, the integer version is even more surprising... Knowing that pointers have provenance, it's plausible that two pointers might behave inconsistently (on the basis that in the abstract machine, their provenance could be different at different points in the program). However, integers are not supposed to have provenance at all...
@Diggsey which are you referring to? I've only seen cases that use pointers.
It seems that printing the value of usize changes the result of comparation.
fn main() {
let a = {
let v = 0;
&v as *const _ as usize
};
let b = {
let v = 0;
&v as *const _ as usize
};
println!("{}", a==b); // false
println!("{a}") // or {b}
println!("{}", a==b); // true
}
and another case:
fn main() {
let a = {
let v = 0;
&v as *const _ as usize
};
let b = {
let v = 0;
&v as *const _ as usize
};
println!("{}", a==b); // false
println!("{}", a==b); // false
let c = a;
println!("{} {} {}", a==b, a==c, b==c); // false true false
println!("{a} {b}");
println!("{} {} {}", a==b, a==c, b==c); // true true true
}
@pitaj Casting the pointers to integers retains this behaviour. Eggyal noticed as much too, as indicated by his “Likewise with usize” comment.
@SaltyKitkat doesn't even need to be printing, I provided a case above that doesn't even directly use prints.
@DoubleHyphen yes I'm aware of those, my cases above are pretty much the same. I thought @Diggsey was saying they had one without any use of pointers
Maybe this is useful, please remove if not. I decided to test this on some less common platforms: macOS aarch64, OpenBSD aarch64 (i.e. a Raspberry Pi), FreeBSD x86_64. The output I get is the same across all these platforms, it's consistent, and the comparisons are consistently correct.
[(23:39:40) luna@anugraha ~/wtf] cargo run
Compiling wtf v0.1.0 (/Users/luna/wtf)
Finished dev [unoptimized + debuginfo] target(s) in 0.68s
Running `target/debug/wtf`
6130756828 == 6130756844 evaluates to false
6130756828 == 6130756844 evaluates to false
[(23:39:43) luna@anugraha ~/wtf] cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/wtf`
6130592988 == 6130593004 evaluates to false
6130592988 == 6130593004 evaluates to false
[(23:39:43) luna@anugraha ~/wtf] cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/wtf`
6162787548 == 6162787564 evaluates to false
6162787548 == 6162787564 evaluates to false
[(23:39:46) luna@anugraha ~/wtf] cargo run
Finished dev [unoptimized + debuginfo] target(s) in 0.00s
Running `target/debug/wtf`
6094712028 == 6094712044 evaluates to false
6094712028 == 6094712044 evaluates to false
Perhaps this is something Linux- or Linux/x86_64-specific?
See upstream issue here: https://github.com/llvm/llvm-project/issues/45725. I think https://github.com/llvm/llvm-project/issues/45725#issuecomment-981029774 explains the actual problem.
Maybe this is useful, please remove if not. I decided to test this on some less common platforms: macOS aarch64, OpenBSD aarch64 (i.e. a Raspberry Pi), FreeBSD x86_64. The output I get is the same across all these platforms, it's consistent, and the comparisons are consistently correct.
[(23:39:40) luna@anugraha ~/wtf] cargo run Compiling wtf v0.1.0 (/Users/luna/wtf) Finished dev [unoptimized + debuginfo] target(s) in 0.68s Running `target/debug/wtf` 6130756828 == 6130756844 evaluates to false 6130756828 == 6130756844 evaluates to false [(23:39:43) luna@anugraha ~/wtf] cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/wtf` 6130592988 == 6130593004 evaluates to false 6130592988 == 6130593004 evaluates to false [(23:39:43) luna@anugraha ~/wtf] cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/wtf` 6162787548 == 6162787564 evaluates to false 6162787548 == 6162787564 evaluates to false [(23:39:46) luna@anugraha ~/wtf] cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/wtf` 6094712028 == 6094712044 evaluates to false 6094712028 == 6094712044 evaluates to false
Perhaps this is something Linux- or Linux/x86_64-specific?
Did you try run the release build(cargo run -r
) ?
Ah, no, my mistake, I apologize! I do see the behavior described above when I run it in release mode; feel free to mark as offtopic.
/r/rust already managed to turn this into a segfault in safe code: https://www.reddit.com/r/rust/comments/112crfj/comment/j8ml9zh/?utm_source=reddit&utm_medium=web2x&context=3
/r/rust already managed to turn this into a segfault in safe code: https://www.reddit.com/r/rust/comments/112crfj/comment/j8ml9zh/?utm_source=reddit&utm_medium=web2x&context=3
Inlining the code to avoid indirection/loss, credits to u/duckerude:
use std::cell::RefCell;
fn main() {
let a = {
let v = 0u8;
&v as *const _ as usize
};
let b = {
let v = 0u8;
&v as *const _ as usize
};
let i = a - b;
let arr = [
RefCell::new(Some(Box::new(1))),
RefCell::new(None),
RefCell::new(None),
];
assert_ne!(i, 0);
let r = arr[i].borrow();
let r = r.as_ref().unwrap();
*arr[0].borrow_mut() = None;
println!("{}", *r);
}
Same example, but without interior mutability:
fn main() {
let a = {
let v = 0u8;
&v as *const _ as usize
};
let b = {
let v = 0u8;
&v as *const _ as usize
};
let i = a - b;
let mut arr = [
Some(Box::new(1)),
None,
];
assert_ne!(i, 0);
let (a, b) = arr.split_at_mut(i);
let r = b[0].as_ref().unwrap();
a[0] = None;
println!("{}", *r);
}
FWIW, inlining also changes the behavior. Run the following program in release mode:
#[inline(never)]
fn cmp(a: usize, b: usize) -> bool {
a == b
}
#[inline(always)]
fn cmp_in(a: usize, b: usize) -> bool {
a == b
}
fn main() {
let a = {
let v = 0;
&v as *const _ as usize
};
let b = {
let v = 0;
&v as *const _ as usize
};
println!("{a:?} == {b:?} -> ==: {}, cmp_in: {}, cmp: {}", a==b, cmp_in(a, b), cmp(a, b));
println!("{a:?} == {b:?} -> ==: {}, cmp_in: {}, cmp: {}", a==b, cmp_in(a, b), cmp(a, b));
}
Disabling inlining fixes the issue - presumably because the optimization context gets lost?
Having seen similar things being discussed over the years, I'm surprised nobody has linked any of them here yet. Earliest I could find (though I feel like there were older discussions? maybe more informal, on IRC, though) was @RalfJung's own post from 2015: https://internals.rust-lang.org/t/comparing-dangling-pointers/3019
How much has changed since that post? My intuition was that, if the comparison result is i1 undef
(or at least similar enough), then the observed behavior here is compatible with that. The post even states:
In particular, not only is the result of comparing a dangling pointers indeterminate and hence non-deterministic, it is actually kind of “lazily picked”: Multiple comparisons can have different results.
So what has changed since? Had LLVM promised to be more careful than that post was worried about?
Was @RalfJung's initial analysis incomplete? I tried skimming the thread but most of it seemed to be people failing to exploit any undef
s to get a miscompilation, at least in several different configurations.
@eddyb: @eggyal demonstrated the issue without any pointer comparisons or operations on dangling pointers (namely, the pointers were cast to usize when still valid).
The root cause might be the same, but it seems different from a language semantics point of view.
Average Rust compiler quality. Should have used gccrs.
namely, the pointers were cast to usize when still valid
To be perfectly honest, it still felt plausible that obtaining the address of a valid pointer can make that value "degrade" once the pointer goes dangling, but I suppose that's a bit ridiculous of me (and I don't think even the C or C++ standards allow that?).
Another thing I almost missed, in the UCG issue on the topic:
I am trying to figure out if LLVM optimizations clearly indicate one or the other choice; so far that has been inconclusive.
So I could totally see LLVM as not intending to allow such strange outcomes, but them only creeping in through optimizations that could then be considered bugs.
The root cause might be the same,
Sure, I just expected decade-old issues with very similar descriptions and was a bit surprised not to find them. (I'm not particularly advocating for anything, just curious of historical context in general)
From my side, the difference compared to back then is that from what I learned, LLVM does not have C's "pointer zapping". And while it is hard to get definite statements on LLVM semantics, I am also now convinced that ptr comparison in LLVM ignores provenance. The fact that no LLVM dev has argued that the current behavior is correct seems to confirm this.
Hm, running this https://github.com/rust-lang/rust/issues/107975#issuecomment-1431758601 example with different rustc versions (cargo run --release)
1.40...1.64:
thread 'main' panicked at 'assertion failed: `(left != right)`
left: `0`,
right: `0`', src/main.rs:18:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
1.65:
thread 'main' panicked at 'already borrowed: BorrowMutError', src/main.rs:21:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
1.66..now
segmentation fault
Hm, running this #107975 (comment) example with different rustc versions (cargo run --release)
1.40...1.64:
thread 'main' panicked at 'assertion failed: `(left != right)` left: `0`, right: `0`', src/main.rs:18:5 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
1.65:
thread 'main' panicked at 'already borrowed: BorrowMutError', src/main.rs:21:6 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
1.66..now
segmentation fault
Huh! How interesting. Did you try the one without interior mutability?
Huh! How interesting. Did you try the one without interior mutability?
that one starts segfaulting in 1.65 and has the left != right panic otherwise (or stops compiling due to assert_ne
not existing before 1.8 or sth
the following segfaults from 1.0 to 1.7, panics from then on
fn main() {
let a = {
let v = 0u8;
&v as *const _ as usize
};
let b = {
let v = 0u8;
&v as *const _ as usize
};
let i = a - b;
let mut arr = [
Some(Box::new(1)),
None,
];
if i == 0 {
panic!()
}
let (a, b) = arr.split_at_mut(i);
let r = b[0].as_ref().unwrap();
a[0] = None;
println!("{}", *r);
}
Yeah, these types of reproductions are pretty sensitive to optimization noise, so I'm not surprised that there isn't a single version that consistently reproduces across versions
With debug assertions enabled I also get the 'already borrowed: BorrowMutError'
thingy on master, so I guess it was an assertion introduced with 1.65 but then moved to debug assertion in 1.66 and forward :thinking:
I know the discussion has moved beyond this, so I apologise for the noise but I wanted to share this variation on my earlier example that (no doubt due to inlining) still exhibits the behaviour:
fn f() -> usize {
let v = 0;
&v as *const _ as _
}
fn main() {
let a = f();
let b = f();
println!("{a:?} == {b:?} evaluates to {}", a==b);
println!("{a:?} == {b:?} evaluates to {}", a==b);
}
Output
140727719695976 == 140727719695976 evaluates to false 140727719695976 == 140727719695976 evaluates to true
I tried this code:
I expected to see this happen: Either the pointers (when cast to integers) are the same and the comparison is
true
, or they are not the same and the comparison isfalse
.Instead, this happened: It printed:
(140728325198984, 140728325198984, false)
Upstream LLVM issue
Meta
Reproduced via
rustc +nightly -Copt-level=3 test.rs && ./test
.rustc --version --verbose
:Also reproduces on master.
@rustbot label +I-unsound +T-compiler +A-llvm