rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
97.08k stars 12.54k forks source link

Unnecessary alloca Without Optimization Flags #129282

Open CJacob314 opened 1 month ago

CJacob314 commented 1 month ago

The following code, when compiled without optimization flags (or with -Copt-level=0) emits LLVM IR for an 8192-byte alloca, which can easily cause 100% unnecessary stack overflow errors in a running executable.

fn test<const SIZE: usize>() {
  if SIZE < 4096 {
    let arr = [0u8; SIZE];
    std::hint::black_box(&arr);
  } else {
    let vec = vec![0u8; SIZE];
    std::hint::black_box(&vec);
  }
}

fn main() {
    test::<8192>();
}

The following is the relevant LLVM IR:

define internal void @example::test::hb882c86c9d7a582d() unnamed_addr #1 personality ptr @rust_eh_personality !dbg !546 {
start:
  %0 = alloca [16 x i8], align 8
  %vec = alloca [24 x i8], align 8
  %arr = alloca [8192 x i8], align 1
  br label %bb2, !dbg !548

%arr only ends up being used by bb1 below, but bb1 has no predecessors:

bb1:                                              ; No predecessors!
  call void @llvm.memset.p0.i64(ptr align 1 %arr, i8 0, i64 8192, i1 false), !dbg !555
; call core::hint::black_box
  %_3 = call align 1 ptr @core::hint::black_box::h61dc8dd57d9b7993(ptr align 1 %arr), !dbg !556
  br label %bb5, !dbg !556
}

Here's a Godbolt Compiler Explorer link with all of the IR.

CJacob314 commented 1 month ago

@saethlin informed me that this is an issue with his mono-reachable traversal implementation.

saethlin commented 1 month ago

I know that when I initially worked on this I searched the generated LLVM IR for the special "No predecessors!" comment that LLVM leaves on such orphaned basic blocks. There are a lot and I think I concluded they were all not actionable.

Clearly this case is.

There are actually two bugs here:

  1. The mono-reachable traverasal only looks for a constant SwitchInt param coming from Rvalue::Use and Rvalue::UbChecks. In this case, we have Rvalue::BinOp. GVN could fix this, but only if test were inlined or we had post-mono MIR optimizations. So I lean towards it being worthwhile to teach mono-reachable traversal about this.

  2. Even if we get the traversal right (this can be simulated by wrapping the comparison in an inline const) we still have an alloca for the argument, because we don't set up locals based on reachability.

scottmcm commented 1 month ago

I think that if you want this you should write

 fn test<const SIZE: usize>() {
-  if SIZE < 4096 {
+  if const { SIZE < 4096 } {

so that rust will know it's a constant, and be more likely to give you the behaviour you want in unoptimized builds.

(Or make your own trait so it can be <[u8; 4096]>::IS_SMALL or something if you need an older MSRV.)

Anything else will just mean exactly the same bug comes back if you write it as SIZE.ilog2() < 12 instead, but if you write const { SIZE.ilog2() < 12 } that'd be fine again.

CJacob314 commented 1 month ago

@scottmcm That doesn't work, either. See version 1.80.0 or version 1.82.0-nightly on Compiler Explorer.