rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.53k stars 12.47k forks source link

Noop loop is only optimized away when the range is half-open #123845

Open krtab opened 4 months ago

krtab commented 4 months ago
#[no_mangle]
pub fn f() {
    for _ in 0..=100 {
        ()
    }
}

#[no_mangle]
pub fn g() {
    for _ in 0..100 {
        ()
    }
}

with -O gives

f:
        xor     ecx, ecx
        mov     eax, 100
.LBB0_1:
        lea     edx, [rcx + 1]
        cmp     ecx, 100
        mov     ecx, edx
        cmovge  ecx, eax
        jge     .LBB0_3
        cmp     ecx, 101
        jl      .LBB0_1
.LBB0_3:
        ret

g:
        ret

Meta

Exists in both 1.77 and nightly.

Backtrace

``` ```

@rustbot label +A-codegen +A-LLVM +C-optimization +I-slow +S-has-mcve +I-heavy +T-compiler -needs-triage

eggyal commented 4 months ago

Not an excuse, but an explanation: inclusive ranges also track whether their iteration has been exhausted. That is, 100..100 is already exhausted but 100..=100 must yield once and then stop.

See https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=fb90029bc08c3d93cbe007ba596751f4

Not sure how easy it will be to identify when such tracking can be optimised away.

cuviper commented 4 months ago

FWIW, internal iteration does optimize away the inclusive range:

#[no_mangle]
pub fn f() {
    (0..=100).for_each(|_| ());
}

#[no_mangle]
pub fn g() {
    for _ in 0..100 {
        ()
    }
}
f:                                      # @f
# %bb.0:
    ret
                                        # -- End function
.set g, f
the8472 commented 4 months ago

This optimizes better. https://rust.godbolt.org/z/sYabYYKn4 Might be worth a try to see if it helps non-empty loop bodies. But TrustedStep is insufficient to specialize the integer ranges like this, it'd require something more targeted.

#[no_mangle]
pub fn f() {
    let r = RangeInclusive { start: 0, end: 100, exhausted: false };
    for _ in r {
        ()
    }
}

#[no_mangle]
pub fn g() {
    for _ in 0usize..=100 {
        ()
    }
}

struct RangeInclusive {
    start: usize,
    end: usize,
    exhausted: bool
}

impl Iterator for RangeInclusive {
    type Item = usize;

    fn next(&mut self) -> Option<Self::Item> {
        if self.exhausted || self.start > self.end {
            return None
        }

        let v = self.start;

        // == Current impl
        // self.exhausted = if v < self.end {
        //     self.start = unsafe { v.unchecked_add(1) };
        // } else {
        //     self.exhausted = true;
        // };

        // == alternative
        {
            self.start = v.wrapping_add(1);
            if v >= self.end {
                self.exhausted = true;
            }
        }

        Some(v)
    }
}
nikic commented 4 months ago

Godbolt: https://rust.godbolt.org/z/7z8n4broM

The reason this is not optimized is probably undef rules. We likely don't see that %iter.sroa.0.06 isn't undef due to the loop and then can't use the range information.

Could be avoided with this transform: https://alive2.llvm.org/ce/z/7XRLAT Which looks like https://github.com/llvm/llvm-project/issues/82414.