Differences between LLVM const-eval of floats and runtime behaviour can cause miscompilations

beetrees commented 4 months ago

I tried this code (based on an example from another issue, which is lightly adapted from @comex's example on a different issue):

#[inline(never)]
fn print_vals(x: f64, i: usize, vals_i: u32) {
    println!("x={x} i={i} vals[i]={vals_i}");
}

#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
    // Loop variables:
    let mut x: f64 = 1.0; // x = x.sin() every time
    let mut i: usize = 0; // increments by 2 every time

    while x != 0.17755388399451055 {
        // LLVM will do a brute-force evaluation of this loop for up to 100
        // iterations to try to calculate an iteration count.  (See
        // `llvm/lib/Analysis/ScalarEvolution.cpp`.)  Under host floating
        // point semantics (on x86_64-unknown-linux-gnu), `x` will equal exactly
        // 0.17755388399451055 after 90 iterations; LLVM discovers this by
        // brute-force evaluation and concludes that the iteration count is
        // always 90.

        // Now, if this loop executes 90 times, then `i` must be in the range
        // `0..180`, so the bounds check in `vals[i]` should always pass, so
        // LLVM eliminates it.
        print_vals(x, i, vals[i]);

        // Update `x`.  The exact computation doesn't matter that much; it just
        // needs to:
        //   (a) be possible to constant-evaluate by brute force (i.e. by going
        //       through each iteration one at a time);
        //   (b) be too complex for IndVarSimplifyPass to simplify *without*
        //       brute force;
        //   (b) differ depending on the current platforms floating-point math
        //       implementation.
        // `sin` is one such function.
        x = x.sin();

        // Update `i`, the integer we use to index into `vals`.  Why increment
        // by 2 instead of 1?  Because if we increment by 1, then LLVM notices
        // that `i` happens to be equal to the loop count, and therefore it can
        // replace the loop condition with `while i != 90`.  With `i` as-is,
        // LLVM could hypothetically replace the loop condition with
        // `while i != 180`, but it doesn't.
        i += 2;

    }
}

pub fn main() {
    // Make an array on the stack:
    let mut vals: [u32; 300] = [0; 300];
    for i in 0..300 { vals[i as usize] = i; }
    evil(&vals);
}

I expected to see this happen: The 100% safe code never segfaults when compiled with optimisations.

Instead, this happened: When cross-compiled with optimisations to a platform with a sin implementation that does not produce identical results to the platform on which the code is being was compiled, the resulting binary will segfault. I discovered this by cross-compiling from x86_64-unknown-linux-gnu to x86_64-pc-windows-msvc and running the resulting binary with wine, but any pair of platforms with differing sin implementations will do.

Meta

rustc --version --verbose:

rustc 1.77.2 (25ef9e3d8 2024-04-09)
binary: rustc
commit-hash: 25ef9e3d85d934b27d9dada2f9dd52b1dc63bb04
commit-date: 2024-04-09
host: x86_64-unknown-linux-gnu
release: 1.77.2
LLVM version: 17.0.6

beetrees commented 4 months ago

@rustbot label +A-floating-point +A-llvm +I-unsound +A-cross

apiraino commented 4 months ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-high

saethlin commented 4 months ago

This different sin implementations problem was previously reported as https://github.com/rust-lang/rust/issues/109118

ds84182 commented 4 months ago

Shouldn't an LLVM issue be filed for this? Ideally the results of transcendentals should be represented by a "fuzzy" range, making compile time float equality impossible in this case.

oli-obk commented 4 months ago

https://github.com/llvm/llvm-project/issues/89885 was filed on the llvm side

RalfJung commented 4 months ago

Ah, fun. So LLVM assumes that sin is deterministic. Yeah that's clearly incoherent.

Now I wonder if the non-determinism in NaNs (as specified in https://github.com/rust-lang/rfcs/pull/3514) is subject to similar issues. If LLVM assumes float operations are deterministic then indeed it must not do any const-folding that differs from the effective runtime behavior.

beetrees commented 4 months ago

@RalfJung Non-deterministic NaNs can cause miscompilations. Here is an example that works on x86-64 (playground):

#[inline(never)]
fn print_vals(x: f64, i: usize, vals_i: u32) {
    println!("x={x} i={i} vals[i]={vals_i}");
}

#[inline(never)]
pub fn evil(vals: &[u32; 300]) {
    const INC: f64 = f64::MAX / 90.0;
    let mut x: f64 = -1.0;
    let mut i: usize = 0;

    while x.is_sign_negative() {
        print_vals(x, i, vals[i]);
        // Use a big decrement to reach negative infinity quickly.
        x -= INC;
        // Convert infinity into a NaN (Inf - Inf = NaN)
        x = x + x - x;
        i += 2;
    }
}

pub fn main() {
    let mut vals: [u32; 300] = [0; 300];
    for i in 0..300 { vals[i as usize] = i; }
    evil(&vals);
}

On x86-64 (and 32-bit x86), NaNs produced at runtime will always have a negative sign whereas NaNs produced by LLVM at compile time always have a positive sign. This means LLVM thinks the loop will exit when x becomes a NaN, but at runtime the loop will never exit.

RalfJung commented 4 months ago

@beetrees thanks for constructing an example!

pnkfelix commented 2 months ago

cc https://github.com/rust-lang/rfcs/pull/3514 cc https://github.com/rust-lang/lang-team/issues/273, https://github.com/rust-lang/lang-team/issues/222

tgross35 commented 1 month ago

https://github.com/llvm/llvm-project/pull/90942 will be in https://github.com/rust-lang/rust/pull/127513, so some of this may be improved soon.

beetrees commented 4 weeks ago

I've tested both the examples and can confirm that this issue has been fixed by #127513.

RalfJung commented 3 weeks ago

Awesome, let's close this then. :)

tgross35 commented 3 weeks ago

Per https://github.com/llvm/llvm-project/issues/89885#issuecomment-2119711089 that was only a partial fix, are we still exposing this in some way?

RalfJung commented 3 weeks ago

I consider the rest to be part of https://github.com/rust-lang/rust/issues/114479

rust-lang / rust

Differences between LLVM const-eval of floats and runtime behaviour can cause miscompilations #124364

Meta