Code sample: nalgebra's swap_rows

rust-lang / rust-memory-model

Collecting examples and information to help design a memory model for Rust.

Apache License 2.0

126 stars 15 forks source link

Code sample: nalgebra's swap_rows #50

Closed HadrienG2 closed 6 years ago

HadrienG2 commented 6 years ago

The nalgebra linear algebra library has a swap_rows method which allows someone to swap two rows of a matrix. Unfortunately, I'm currently encountering an UB-smelling issue which causes data corruption in some circumstances and seems closely related to that method.

The implementation of swap_rows is a loop that iterates over the container cells associated with the source and destination rows of the matrix, then creates two &mut from pairs of cells, and calls mem::swap on the pairs. One of the things which @sebcrozet noticed while experimenting around is that replacing this mem::swap with a ptr::swap seems to make the UB go away. The issue is highly sensitive to many factors, though, so we can't tell for sure if the bug is really gone with that.

One suspicion of mine is that the two &mut which swap_rows create on the same array slice might be consider by either rustc or LLVM to alias, even if from our perspective they don't (as they point to different scalars), which would trigger UB. What is your opinion on this?

HadrienG2 commented 6 years ago

Here is a minimal program which is spiritually equivalent to the problematic part of my nalgebra-based application, and exhibits the same UB symptoms on my machine:

extern crate nalgebra;

use nalgebra::Matrix3x4;

fn swappy() -> Matrix3x4<f32> {
    let mut mat = Matrix3x4::new(1., 2.,  3.,  4.,
                                 5., 6.,  7.,  8.,
                                 9., 10., 11., 12.);

    // NOTE: This printf makes the bug go away, suggesting UB or a codegen issue
    // println!("Result: {}", mat);

    for i in 0..2 {
        for j in i+1..3 {
            if mat[(j, 3)] > mat[(i, 3)] { mat.swap_rows(i, j); }
        }
    }

    mat
}

fn main() {
    let mat = swappy();
    println!("Result: {}", mat);
}

Expected output:

  ┌             ┐
  │  9 10 11 12 │
  │  5  6  7  8 │
  │  1  2  3  4 │
  └             ┘

Actual output (nalgebra 0.16.2, rustc 1.29, Ivy Bridge & Haswell CPUs, release build & codegen-units=1):

Result: 
  ┌             ┐
  │  9 10 11 12 │
  │  5  6  7  8 │
  │  1  6  7  4 │
  └             ┘

nagisa commented 6 years ago

To summarize in short:

let object: StorageMut<T> = something;
let cell1: *mut T = object.ptr_mut().offset(index1);
let cell2: *mut T = object.ptr_mut().offset(index2);
std::mem::swap(&mut *cell1, &mut *cell2);

Given that index1 != index2, I agree that there should be no aliasing issues here.

That being said, I cannot reproduce with the given sample program, and thus, cannot investigate. I tried with all nalgebra versions from 0.15.3 to 0.16.2.

HadrienG2 commented 6 years ago

Upon further investigation, there are two ingredients which I forgot to mention so far:

Must build in release mode (as with most UB-ish issues)
Must have codegen-units=1 (it seems that one of the optimizations that are inhibited by multiple codegen units is needed for this to fail)

HadrienG2 commented 6 years ago

Let me correct myself: it still fails with codegen-units=2, but not with codegen-units=3... So it's not the use of multiple codegen units per se that is problematic, but rather the specific way these were sliced up by rustc. I would therefore still recommend use of codegen-units=1 to be safe.

nagisa commented 6 years ago

This is most likely a bug introduced by upgrade from LLVM 7.0 to LLVM 8.0. Please report this issue on rust-lang/rust.

HadrienG2 commented 6 years ago

Thanks for confirming that two non-overlapping &mut to a single slice are not UB in Rust. I'll report it as a compiler bug then ;)