Suboptimal code generation for omittable bound checks

loskutov commented 2 years ago

Compiler Explorer link: https://godbolt.org/z/KMTfY18Wf

I tried this code:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0..n {
        assert!(d[i].len() == n);
        for j in 0..n {
            d[i][j] = 0;
        }
    }
}

With -C llvm-args=-enable-constraint-elimination, I expected the bound checks to be optimized out as they're performed manually by assert!. However, both assert and bound checks were present in the compiled code. The inner loop looks really dumb in terms of code generation:

.LBB0_5:
        cmp     rsi, rdx
        je      .LBB0_9
        mov     dword ptr [rcx + 4*rdx], 0
        add     rdx, 1
        cmp     rsi, rdx
        jne     .LBB0_5

rsi != rdx is the loop invariant, but it's verified on each iteration.

At first I suspected the issue to be entirely on the LLVM side, but bound checks are omitted in equivalent C++ code (which is also present at the Compiler Explorer link), and the whole inner loop is replaced with a single memset call.

Meta

rustc --version --verbose:

rustc 1.58.0-nightly (a77da2d45 2021-11-19)
binary: rustc
commit-hash: a77da2d454e6caa227a85b16410b95f93495e7e0
commit-date: 2021-11-19
host: x86_64-unknown-linux-gnu
release: 1.58.0-nightly
LLVM version: 13.0.0

the8472 commented 2 years ago

To get memset as in the clang version you can replace

        for j in 0..n {
            d[i][j] = 0;
        }

with

d[i][0..n].fill(0);

loskutov commented 2 years ago

The code example is pretty synthetic and doesn't make much sense on its own; however, similar effects can be observed in slightly more complicated programs, e.g. Floyd—Warshall algorithm.

My point is that the missed optimization is pretty important in real-life programs, as it's not uncommon to be unable to avoid array indexing, and bounds-checking overhead is often claimed to be negligible as the checks are likely to be optimized out. And of course I expect them to be optimized out when they are optimized out by clang in an equivalent C++ program.

leonardo-m commented 2 years ago

There are multiple ways to solve the problem, like:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0 .. n {
        let row = &mut d[i][.. n];
        for j in 0 .. n {
            row[j] = 0;
        }
    }
}

What's the best way to solve this problem depends on the specific situation.

loskutov commented 2 years ago

True. However, the issue is about poor codegen for this particular piece of code, not about looking for a way to rewrite it.

leonardo-m commented 2 years ago

Your original code contains an assert, its purpose is to tell something to the optimizer. Often that helps, but you can't be sure the compiler will use the information given by the assert to its fullest. It's a "best effort" for the compiler. So you are waiting for the compiler to do what you want based on "best efforts".

scottmcm commented 2 years ago

Thanks for including the vector::at C++ version!

I suggest that you prefer re-slicing to asserts for avoiding bounds checks. For example, this one works:

pub fn zero(d: &mut [Vec<i32>]) {
    let n = d.len();
    for i in 0..n {
        let di = &mut d[i][..n];
        for j in 0..n {
            di[j] = 0;
        }
    }
}

https://rust.godbolt.org/z/f19nasGPh

And that's a general strategy for improving bound-check-elimination. It's how the the new reverse implementation works, for example (#90821).

rust-lang / rust

Suboptimal code generation for omittable bound checks #91109

Meta