rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.12k stars 12.8k forks source link

Unmerged stack slots under Windows #132014

Open xTachyon opened 1 month ago

xTachyon commented 1 month ago

https://godbolt.org/z/1foPhW5PT

Relevant bits:

example::write_characteristics::hd20ef966b954cd90:
        sub     rsp, 200
define void @example::write_characteristics::hd20ef966b954cd90(i16 noundef %c) unnamed_addr {
start:
  %0 = alloca [16 x i8], align 8
  %1 = alloca [16 x i8], align 8
  %2 = alloca [16 x i8], align 8
  %3 = alloca [16 x i8], align 8
  %4 = alloca [16 x i8], align 8
  %5 = alloca [16 x i8], align 8
  %6 = alloca [16 x i8], align 8
  %7 = alloca [16 x i8], align 8
  %8 = alloca [16 x i8], align 8
  %9 = alloca [16 x i8], align 8

I think all the allocas should've been merged in one, or be able to pass a pointer to a global const with the slice.

xTachyon commented 1 month ago

For the first idea, I'd expect codegen similar to this: https://godbolt.org/z/5M8jG5orf

bjorn3 commented 1 month ago

I think what happens is that at the MIR level each call argument is a const operand and never stored into a temporary variable. As such no StorageLive and StorageDead MIR statements are emitted, which then results in LLVM never being told that the lifetime of the stack slots doesn't overlap.

xTachyon commented 1 month ago

Rustc emiting lifetime start/end would be great, but even without, I don't see why LLVM can't figure out by itself it can merge them. The function argument is nocapture, so the alloca should be dead after each call. At least that's my understanding of it.

bjorn3 commented 1 month ago

Turns out LLVM does actually infer lifetime start/end intrinsics, yet still fails to overlap the stack slots.

DianQK commented 1 month ago

I’m not sure if this is what you’re looking for: https://godbolt.org/z/qh8ov63hb. I don't know if this is an optimization.