Return value of `inline` functions called from `.Naked` functions cause stack allocations

zeroZshadow commented 3 months ago

Zig Version

0.14.0-dev.1298+d9e8671d9

Steps to Reproduce and Observed Behavior

Compiling the following snippet using -ODebug -target powerpc-freestanding-eabi

export fn someFunc() callconv(.Naked) noreturn {
    write(read());
    asm volatile ("rfi");
}

pub inline fn read() u32 {
    return asm volatile ("mflr %[r]"
        : [r] "=r" (-> u32),
    );
}

pub inline fn write(value: u32) void {
    asm volatile ("mtlr %[r]"
        :
        : [r] "r" (value),
    );
}

Output:

someFunc:
        mflr    3
        stw 3, -4(1)
        b .LBB0_1
.LBB0_1:
        lwz 3, -4(1)
        mtlr    3
        b .LBB0_2
.LBB0_2:
        trap

It will emit 2 stack operations to store the return value of read() and then load it again from stack for use with write(). It will only do this in .Debug mode.

Expected Behavior

No stack read and write.

Manually inlining the 2 functions leads to:

export fn someFunc2() callconv(.Naked) noreturn {
    const lr = asm volatile ("mflr %[r]"
        : [r] "=r" (-> u32),
    );

    asm volatile ("mtlr %[r]"
        :
        : [r] "r" (lr),
    );
}

someFunc2:
        mflr    3
        mtlr    3
        trap

alexrp commented 3 months ago

These stack allocations are not coming from us:

# Begin Function AIR: ppc.someFunc:
# Total AIR+Liveness bytes: 463B
# AIR Instructions:         15 (135B)
# AIR Extra Data:           47 (188B)
# Liveness tomb_bits:       8B
# Liveness Extra Data:      3 (12B)
# Liveness special table:   2 (16B)
  %0!= save_err_return_trace_index()
  %1!= dbg_stmt(2:15)
  %3 = dbg_inline_block(u32, <fn () callconv(.Inline) u32, (function 'read')>, {
    %4!= dbg_stmt(2:5)
    %5 = assembly(u32, volatile, [r] -> =r, "mflr %[r]")
    %6!= dbg_stmt(2:5)
    %7!= br(%3, %5!)
  })
  %2!= dbg_inline_block(void, <fn (u32) callconv(.Inline) void, (function 'write')>, {
    %8!= dbg_arg_inline(%3, "value")
    %9!= dbg_stmt(2:5)
    %10!= assembly(void, volatile, [r] in r = (%3!), "mtlr %[r]")
    %11!= br(%2, @Air.Inst.Ref.void_value)
  } %3!)
  %12!= dbg_stmt(3:5)
  %13!= assembly(void, volatile, "rfi")
  %14!= trap()
# End Function AIR: ppc.someFunc

; Function Attrs: naked noredzone noreturn nounwind uwtable nosanitize_coverage skipprofile
define dso_local void @someFunc() #0 !dbg !6 {
Entry:
  %0 = call i32 asm sideeffect "mflr ${0}", "=r"(), !dbg !7
  br label %Block, !dbg !7

Block:
  %1 = phi i32 [ %0, %Entry ], !dbg !8
  call void @llvm.dbg.value(metadata i32 %1, metadata !10, metadata !DIExpression()), !dbg !9
  call void asm sideeffect "mtlr ${0}", "r"(i32 %1), !dbg !11
  br label %Block1, !dbg !11

Block1:
  call void asm sideeffect "rfi", ""(), !dbg !12
  call void @llvm.trap(), !dbg !12
  unreachable, !dbg !12
}

In the general case, there is nothing we can do about this; LLVM is free to introduce stack accesses in backends if it likes. This is why both GCC and Clang consider asm statements to really be the only valid contents of naked functions because that's the only way you're guaranteed to have no surprising codegen like this.

This is why I've talked about a new language rule requiring naked functions to reduce to only asm expressions after aggressive comptime evaluation of the entire function body, in a similar fashion to how container-level comptime blocks work.

zeroZshadow commented 3 months ago

Sorry I should have clarified, these stack allocations are coming from the debug information. We've previously had this for inline function parameters as well, but some recent work fixed that?

So that would likely be call void @llvm.dbg.value(metadata i32 %1, metadata !10, metadata !DIExpression()), !dbg !9 from your IL output. I'm not sure why LLVM allows these in .Naked functions

alexrp commented 3 months ago

Yes, I fixed a case where Zig itself was emitting an alloca + store for an llvm.dbg.declare() call in naked functions. But if the LLVM PowerPC backend is turning llvm.dbg.value() calls into stack allocations... well, there's really not a lot we can do about that short of completely omitting debug info in naked functions.

I can make that change if we want it, but we're fundamentally treating symptoms here. At the end of the day, an LLVM backend is still allowed to create stack accesses for non-asm code in naked functions. It's just down to luck and/or regalloc behavior when it does.

zeroZshadow commented 3 months ago

After a quick chat we've found that the issue is not in the @llvm.dbg.value statements, but with the blocks generated for the called inline method.

Removing the block and it's related parts will generate the same IR as the manual inlined version, and no longer do stack allocation.

define dso_local void @someFunc() naked noinline {
Entry:
  %0 = call i32 asm sideeffect "mflr ${0}", "=r"()

  call void asm sideeffect "mtlr ${0}", "r"(i32 %0)
  br label %Block1

Block1:
  call void @llvm.trap()
  unreachable
}

someFunc:                               # @someFunc
        mflr    3
        mtlr    3
        b .LBB0_1
.LBB0_1:                                # %Block1
        trap

EDIT:

Just to clarify, this also happens on x86 and other targets. I just happen to code on powerpc.

ziglang / zig