rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.23k stars 12.7k forks source link

Miscompilation: Equal pointers comparing as unequal #107975

Open JakobDegen opened 1 year ago

JakobDegen commented 1 year ago

I tried this code:

pub fn foo() -> (usize, usize, bool) {
    let a: *const u8;
    let b: *const u8;
    {
        let v: [u8; 16] = [std::hint::black_box(4); 16];
        a = &(v[0]);
    }
    {
        let v: [u8; 16] = [std::hint::black_box(4); 16];
        b = &(v[0]);
    }
    (a as usize, b as usize, a == b)
}

fn main() {
    println!("{:?}", foo());
}

I expected to see this happen: Either the pointers (when cast to integers) are the same and the comparison is true, or they are not the same and the comparison is false.

Instead, this happened: It printed: (140728325198984, 140728325198984, false)

Upstream LLVM issue

Meta

Reproduced via rustc +nightly -Copt-level=3 test.rs && ./test.

rustc --version --verbose:

rustc 1.69.0-nightly (5b8f28453 2023-02-12)
binary: rustc
commit-hash: 5b8f284536d00ba649ca968584bedab4820d8527
commit-date: 2023-02-12
host: x86_64-unknown-linux-gnu
release: 1.69.0-nightly
LLVM version: 15.0.7

Also reproduces on master.

@rustbot label +I-unsound +T-compiler +A-llvm

JakobDegen commented 1 year ago

Godbolt indicates that this is EarlyCSE in LLVM optimizing the pointer comparison to false despite the two stack allocations have non-overlapping lifetime ranges.

zirconium-n commented 1 year ago

This seems related to pointer provenance.

JakobDegen commented 1 year ago

The bug itself or the cause of the bug? I don't know what the incorrect reasoning that LLVM is using here is, so I can't comment on the cause of the issue. But both Rust and LLVM clearly define pointer comparison to be address based, which means that I don't think we need to think about provenance in order to conclude that this is a miscompilation

DoubleHyphen commented 1 year ago

This seems related to pointer provenance.

tl;dr even if the docs were changed to take provenance into account when comparing for equality, there would still be a miscompilation, because evaluating the same expression twice leads to different results!

fn main() {
    let a: *const u8;
    let b: *const u8;
    {
        let v: [u8; 16] = [core::hint::black_box(0); 16];
        a = &(v[0]);
    }
    {
        let v: [u8; 16] = [core::hint::black_box(0); 16];
        b = &(v[0]);
    }
    println!("{a:?} == {b:?} evaluates to {}", a==b);
    println!("{a:?} == {b:?} evaluates to {}", a==b);
}
0x7ffd8828d818 == 0x7ffd8828d818 evaluates to false
0x7ffd8828d818 == 0x7ffd8828d818 evaluates to true

Playground link

This was exactly the train of thought that led me to experiment, thereby uncovering the bug: “Can two pointers compare as equal, even when they belong to different allocations?” I experimented, figured out that they cannot, and tried to refactor my println!s, whereupon the program started behaving differently. (In the second linked program, uncommenting line 18 changes the output of line 19.)

PS: In case it matters, my experiments were on the stable channel.

RalfJung commented 1 year ago

tl;dr even if the docs were changed to take provenance into account when comparing for equality, there would still be a miscompilation, because evaluating the same expression twice leads to different results!

In theory they could define pointer comparison to be non-deterministic, and then this would be okay. For a comparison that involves provenance, that is probably the case.

But to my knowledge that's not how LLVM intends their ptr comparison to work.

That last example is really strange though, why would it optimize only one of them? Also I was unable to reproduce this without printing, which is equally strange...

DoubleHyphen commented 1 year ago

Also I was unable to reproduce this without printing, which is equally strange...

I thought that was queer, so I experimented further.

What I found is as follows:

Imberflur commented 1 year ago

I tried some alternatives to println!, it looks like black_box(format_args!("{:?}", a)) is sufficient for it to start evaluating to true:

// Produces true on following println

black_box(format_args!("{:?}", a));
//black_box(format_args!("{:?}", b));
//format!("{:?}", a);

// Doesn't

//format_args!("{:?}", black_box(a));
//black_box(format_args!("{:?}", black_box(a)));
//format!("{:?}", black_box(a));
RalfJung commented 1 year ago

The wild thing is that black_box(a) doesn't cause this... so format_args is an even blacker box than black_box, or so? black_box is supposed to be a wildcard, and for all LLVM knows it might be calling format_args...

In fact looks like adding black_box actually enables the optimization?!?

Imberflur commented 1 year ago

Reduced to black_box(&a) (edit: here is a playground link)

JakobDegen commented 1 year ago

The wild thing is that black_box(a) doesn't cause this... so format_args is an even blacker box than black_box, or so? black_box is supposed to be a wildcard, and for all LLVM knows it might be calling format_args...

Yeah, this isn't all too surprising, the difference between black_box(a) and black_box(&a) is quite significant

apiraino commented 1 year ago

Result from my bisection points to a rollup. Out of that rollup likely #102232, I guess?

searched nightlies: from nightly-2022-01-01 to nightly-2023-02-14 regressed nightly: nightly-2022-09-29 searched commit range: https://github.com/rust-lang/rust/compare/470e518c4b43265020c882bcf3c86632f5494910...ce7f0f1aa0f02c45cad0749e63f3086234b1f422 regressed commit: https://github.com/rust-lang/rust/commit/837bf370de144a682041e68bb67469b9f68a55ce

bisected with cargo-bisect-rustc v0.6.5 Host triple: x86_64-unknown-linux-gnu Reproduce with: ```bash cargo bisect-rustc --start=2022-01-01 --script script.sh ```

Also - If I'm correct - this should tag more t-libs than t-compiler

@rustbot label -t-compiler +t-libs +t-libs-api

DoubleHyphen commented 1 year ago

Is it note-worthy that the issue can trigger without using black_box at all?

RalfJung commented 1 year ago

Yeah, this isn't all too surprising, the difference between black_box(a) and black_box(&a) is quite significant

How's that not surprising?^^ In terms of tings like "exposing the pointer", both should be equivalent...

Is it note-worthy that the issue can trigger without using black_box at all?

In fact even the original examples work without black-box, e.g.

fn main() {
    let a: *const u8;
    let b: *const u8;
    {
        let v: [u8; 16] = [0; 16];
        a = &(v[0]);
    }
    {
        let v: [u8; 16] = [0; 16];
        b = &(v[0]);
    }
    println!("{a:?} == {b:?} evaluates to {}", a==b);
    println!("{a:?} == {b:?} evaluates to {}", a==b);
}
RalfJung commented 1 year ago

Result from my bisection points to a rollup. Out of that rollup likely https://github.com/rust-lang/rust/pull/102232, I guess?

That just stabilizes black_box without changing its behavior. Presumably you need to add a feature flag to continue bisecting this code further into the past?

Definitely looks like a t-compiler issue to me.

eggyal commented 1 year ago

Somewhat simpler, still demonstrating misbehaviour, again no black_box. Likewise with usize rather than *const types.

Edit: updated even simpler.

DoubleHyphen commented 1 year ago

I whipped up a quick example for the Compiler Explorer, and tried it with the 10 newest and the 10 oldest stable versions of the compiler. All of them appear to fold the comparison to xor eax eax, ie all of them arbitrarily decide it's false.

What's even more staggering is that this behaviour persists even if we use strict provenance and expose_addr. I'd been hoping that this would fix the issue, but no.

pitaj commented 1 year ago

Here's an example without the prints in between, using a bunch of black boxes:

https://godbolt.org/z/cKMra38a8

It appears that copying the value from either integer causes llvm to "realize" that they're actually equal. Before that it assumes they're not equal.

Edit: actually only the one black box is needed (let c = black_box(a)) the others aren't necessary to reproduce

apiraino commented 1 year ago

Hmm you are all right here, the black_box just confused me. And as @DoubleHyphen points out it's not even a regression, I can reproduce with any version of rustc.

@rustbot label T-compiler -T-libs -T-libs-api P-high

poliorcetics commented 1 year ago

I can see it in C++ too with clang trunk, -O1 (https://c.godbolt.org/z/EjvKTqqax)

#include <stdio.h>
#include <cstdint>

int main(void) {
    int *a;
    int *b;

    {
        int v[2] = {0, 0};
        a = &(v[0]);
    }

    {
        int v[2] = {0, 0};
        b = &(v[0]);
    }

    printf("%p ==/^ %p is %d, %ld\n", a, b, a == b, ((uintptr_t)a) ^ ((uintptr_t)b));

    return 0;
}

outputs:

0x7fffc8ddd080 ==/^ 0x7fffc8ddd080 is 0, 0

One the two zeros at the end is not correct

dgrunwald commented 1 year ago

That C++ program has undefined behavior: you're using pointers after the end of their lifetime. Even if only for a comparison, any use of an invalid pointer's value results in UB. So the weird output is "correct" for C++.

But Rust is supposed to have undefined behavior only if an invalid pointer is dereferenced, not on other kinds of uses of an invalid pointer. And once you add in the cast to integer, I think it's a miscompilation even for C++.

poliorcetics commented 1 year ago

That C++ program has undefined behavior: you're using pointers after the end of their lifetime.

I should have known my meager C++ knowledge was wrong 😅

Anyway, adding the casts earlier where v is still live still produces the same wrong result: https://c.godbolt.org/z/Tvx3KW3Ex

#include <stdio.h>
#include <cstdint>

int main(void) {
    uintptr_t a;
    uintptr_t b;

    {
        int v[2] = {0, 0};
        a = (uintptr_t)&(v[0]);
    }

    {
        int v[2] = {0, 0};
        b = (uintptr_t)&(v[0]);
    }

    printf("%ld ==/^ %ld is %d, %ld\n", a, b, a == b, a ^ b);

    return 0;
}
140734883788880 ==/^ 140734883788880 is 0, 0

(I may still be wrong, I'm writing C and compiling it as C++ from half remembered knowledge years out of date)

JakobDegen commented 1 year ago

How's that not surprising?^^ In terms of tings like "exposing the pointer", both should be equivalent...

Oh this is interesting, I just checked this on godbolt but was not aware of it before... I had expected the black_box intrinsic to primarily act like a function call and still pass its arguments via their usual ABI, instead of also exposing their current address...

pitaj commented 1 year ago

I noticed that the implementation of black_box is different for ZSTs. Sure enough, ZST pointers behave differently here (always true, even without black_box).

https://play.rust-lang.org/?version=stable&mode=release&edition=2021&gist=47d579c7aea7decf3cde9be8f5a6a364

wsy2220 commented 1 year ago

Maybe I'm used to c/c++'s behaviour. But Is this really a bug? Do we have any promise that two different allocations have the same address?

pitaj commented 1 year ago

@wsy2220 it's not that they must have the same address. It's that the comparison of their addresses results in two different results (true AND false) at consecutive points in the program.

wsy2220 commented 1 year ago

Ah now I see it. Thanks.

Diggsey commented 1 year ago

To me, the integer version is even more surprising... Knowing that pointers have provenance, it's plausible that two pointers might behave inconsistently (on the basis that in the abstract machine, their provenance could be different at different points in the program). However, integers are not supposed to have provenance at all...

pitaj commented 1 year ago

@Diggsey which are you referring to? I've only seen cases that use pointers.

SaltyKitkat commented 1 year ago

It seems that printing the value of usize changes the result of comparation.

fn main() {
    let a = {
        let v = 0;
        &v as *const _ as usize
    };
    let b = {
        let v = 0;
        &v as *const _ as usize
    };
    println!("{}", a==b); // false
    println!("{a}") // or {b}
    println!("{}", a==b); // true
}

and another case:

fn main() {
    let a = {
        let v = 0;
        &v as *const _ as usize
    };
    let b = {
        let v = 0;
        &v as *const _ as usize
    };
    println!("{}", a==b); // false
    println!("{}", a==b); // false
    let c = a;
    println!("{} {} {}", a==b, a==c, b==c); // false true false
    println!("{a} {b}");
    println!("{} {} {}", a==b, a==c, b==c); // true true true
}
DoubleHyphen commented 1 year ago

@pitaj Casting the pointers to integers retains this behaviour. Eggyal noticed as much too, as indicated by his “Likewise with usize” comment.

pitaj commented 1 year ago

@SaltyKitkat doesn't even need to be printing, I provided a case above that doesn't even directly use prints.

@DoubleHyphen yes I'm aware of those, my cases above are pretty much the same. I thought @Diggsey was saying they had one without any use of pointers

lunasophia commented 1 year ago

Maybe this is useful, please remove if not. I decided to test this on some less common platforms: macOS aarch64, OpenBSD aarch64 (i.e. a Raspberry Pi), FreeBSD x86_64. The output I get is the same across all these platforms, it's consistent, and the comparisons are consistently correct.

[(23:39:40) luna@anugraha ~/wtf] cargo run
   Compiling wtf v0.1.0 (/Users/luna/wtf)
    Finished dev [unoptimized + debuginfo] target(s) in 0.68s
     Running `target/debug/wtf`
6130756828 == 6130756844 evaluates to false
6130756828 == 6130756844 evaluates to false
[(23:39:43) luna@anugraha ~/wtf] cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/wtf`
6130592988 == 6130593004 evaluates to false
6130592988 == 6130593004 evaluates to false
[(23:39:43) luna@anugraha ~/wtf] cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/wtf`
6162787548 == 6162787564 evaluates to false
6162787548 == 6162787564 evaluates to false
[(23:39:46) luna@anugraha ~/wtf] cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/wtf`
6094712028 == 6094712044 evaluates to false
6094712028 == 6094712044 evaluates to false

Perhaps this is something Linux- or Linux/x86_64-specific?

inclyc commented 1 year ago

See upstream issue here: https://github.com/llvm/llvm-project/issues/45725. I think https://github.com/llvm/llvm-project/issues/45725#issuecomment-981029774 explains the actual problem.

chenfengyuan commented 1 year ago

Maybe this is useful, please remove if not. I decided to test this on some less common platforms: macOS aarch64, OpenBSD aarch64 (i.e. a Raspberry Pi), FreeBSD x86_64. The output I get is the same across all these platforms, it's consistent, and the comparisons are consistently correct.

[(23:39:40) luna@anugraha ~/wtf] cargo run
   Compiling wtf v0.1.0 (/Users/luna/wtf)
    Finished dev [unoptimized + debuginfo] target(s) in 0.68s
     Running `target/debug/wtf`
6130756828 == 6130756844 evaluates to false
6130756828 == 6130756844 evaluates to false
[(23:39:43) luna@anugraha ~/wtf] cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/wtf`
6130592988 == 6130593004 evaluates to false
6130592988 == 6130593004 evaluates to false
[(23:39:43) luna@anugraha ~/wtf] cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/wtf`
6162787548 == 6162787564 evaluates to false
6162787548 == 6162787564 evaluates to false
[(23:39:46) luna@anugraha ~/wtf] cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/wtf`
6094712028 == 6094712044 evaluates to false
6094712028 == 6094712044 evaluates to false

Perhaps this is something Linux- or Linux/x86_64-specific?

Did you try run the release build(cargo run -r) ?

lunasophia commented 1 year ago

Ah, no, my mistake, I apologize! I do see the behavior described above when I run it in release mode; feel free to mark as offtopic.

golddranks commented 1 year ago

/r/rust already managed to turn this into a segfault in safe code: https://www.reddit.com/r/rust/comments/112crfj/comment/j8ml9zh/?utm_source=reddit&utm_medium=web2x&context=3

matthieu-m commented 1 year ago

/r/rust already managed to turn this into a segfault in safe code: https://www.reddit.com/r/rust/comments/112crfj/comment/j8ml9zh/?utm_source=reddit&utm_medium=web2x&context=3

Inlining the code to avoid indirection/loss, credits to u/duckerude:

use std::cell::RefCell;

fn main() {
    let a = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let b = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let i = a - b;
    let arr = [
        RefCell::new(Some(Box::new(1))),
        RefCell::new(None),
        RefCell::new(None),
    ];
    assert_ne!(i, 0);
    let r = arr[i].borrow();
    let r = r.as_ref().unwrap();
    *arr[0].borrow_mut() = None;
    println!("{}", *r);
}
oli-obk commented 1 year ago

Same example, but without interior mutability:

fn main() {
    let a = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let b = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let i = a - b;
    let mut arr = [
        Some(Box::new(1)),
        None,
    ];
    assert_ne!(i, 0);
    let (a, b) = arr.split_at_mut(i);
    let r = b[0].as_ref().unwrap();
    a[0] = None;
    println!("{}", *r);
}
Janrupf commented 1 year ago

FWIW, inlining also changes the behavior. Run the following program in release mode:

#[inline(never)]
fn cmp(a: usize, b: usize) -> bool {
    a == b
}

#[inline(always)]
fn cmp_in(a: usize, b: usize) -> bool {
    a == b
}

fn main() {
    let a = {
        let v = 0;
        &v as *const _ as usize
    };
    let b = {
        let v = 0;
        &v as *const _ as usize
    };
    println!("{a:?} == {b:?} -> ==: {}, cmp_in: {}, cmp: {}", a==b, cmp_in(a, b), cmp(a, b));
    println!("{a:?} == {b:?} -> ==: {}, cmp_in: {}, cmp: {}", a==b, cmp_in(a, b), cmp(a, b));
}

Disabling inlining fixes the issue - presumably because the optimization context gets lost?

eddyb commented 1 year ago

Having seen similar things being discussed over the years, I'm surprised nobody has linked any of them here yet. Earliest I could find (though I feel like there were older discussions? maybe more informal, on IRC, though) was @RalfJung's own post from 2015: https://internals.rust-lang.org/t/comparing-dangling-pointers/3019

How much has changed since that post? My intuition was that, if the comparison result is i1 undef (or at least similar enough), then the observed behavior here is compatible with that. The post even states:

In particular, not only is the result of comparing a dangling pointers indeterminate and hence non-deterministic, it is actually kind of “lazily picked”: Multiple comparisons can have different results.

So what has changed since? Had LLVM promised to be more careful than that post was worried about? Was @RalfJung's initial analysis incomplete? I tried skimming the thread but most of it seemed to be people failing to exploit any undefs to get a miscompilation, at least in several different configurations.

nuiva commented 1 year ago

@eddyb: @eggyal demonstrated the issue without any pointer comparisons or operations on dangling pointers (namely, the pointers were cast to usize when still valid).

The root cause might be the same, but it seems different from a language semantics point of view.

ghost commented 1 year ago

Average Rust compiler quality. Should have used gccrs.

eddyb commented 1 year ago

namely, the pointers were cast to usize when still valid

To be perfectly honest, it still felt plausible that obtaining the address of a valid pointer can make that value "degrade" once the pointer goes dangling, but I suppose that's a bit ridiculous of me (and I don't think even the C or C++ standards allow that?).


Another thing I almost missed, in the UCG issue on the topic:

So I could totally see LLVM as not intending to allow such strange outcomes, but them only creeping in through optimizations that could then be considered bugs.


The root cause might be the same,

Sure, I just expected decade-old issues with very similar descriptions and was a bit surprised not to find them. (I'm not particularly advocating for anything, just curious of historical context in general)

RalfJung commented 1 year ago

From my side, the difference compared to back then is that from what I learned, LLVM does not have C's "pointer zapping". And while it is hard to get definite statements on LLVM semantics, I am also now convinced that ptr comparison in LLVM ignores provenance. The fact that no LLVM dev has argued that the current behavior is correct seems to confirm this.

matthiaskrgr commented 1 year ago

Hm, running this https://github.com/rust-lang/rust/issues/107975#issuecomment-1431758601 example with different rustc versions (cargo run --release)

1.40...1.64:

thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `0`,
 right: `0`', src/main.rs:18:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

1.65:

thread 'main' panicked at 'already borrowed: BorrowMutError', src/main.rs:21:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

1.66..now

segmentation fault
DoubleHyphen commented 1 year ago

Hm, running this #107975 (comment) example with different rustc versions (cargo run --release)

1.40...1.64:

thread 'main' panicked at 'assertion failed: `(left != right)`
  left: `0`,
 right: `0`', src/main.rs:18:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

1.65:

thread 'main' panicked at 'already borrowed: BorrowMutError', src/main.rs:21:6
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

1.66..now

segmentation fault

Huh! How interesting. Did you try the one without interior mutability?

oli-obk commented 1 year ago

Huh! How interesting. Did you try the one without interior mutability?

that one starts segfaulting in 1.65 and has the left != right panic otherwise (or stops compiling due to assert_ne not existing before 1.8 or sth

the following segfaults from 1.0 to 1.7, panics from then on

fn main() {
    let a = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let b = {
        let v = 0u8;
        &v as *const _ as usize
    };
    let i = a - b;
    let mut arr = [
        Some(Box::new(1)),
        None,
    ];
    if i == 0 {
        panic!()
    }
    let (a, b) = arr.split_at_mut(i);
    let r = b[0].as_ref().unwrap();
    a[0] = None;
    println!("{}", *r);
}
JakobDegen commented 1 year ago

Yeah, these types of reproductions are pretty sensitive to optimization noise, so I'm not surprised that there isn't a single version that consistently reproduces across versions

matthiaskrgr commented 1 year ago

With debug assertions enabled I also get the 'already borrowed: BorrowMutError' thingy on master, so I guess it was an assertion introduced with 1.65 but then moved to debug assertion in 1.66 and forward :thinking:

eggyal commented 1 year ago

I know the discussion has moved beyond this, so I apologise for the noise but I wanted to share this variation on my earlier example that (no doubt due to inlining) still exhibits the behaviour:

fn f() -> usize {
    let v = 0;
    &v as *const _ as _
}

fn main() {
    let a = f();
    let b = f();
    println!("{a:?} == {b:?} evaluates to {}", a==b);
    println!("{a:?} == {b:?} evaluates to {}", a==b);
}

Output

140727719695976 == 140727719695976 evaluates to false
140727719695976 == 140727719695976 evaluates to true

Playground.