llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
29.03k stars 11.97k forks source link

Incorrect code on AArch64 for call in a function with 'ghccc' calling convention #70577

Open waterlens opened 1 year ago

waterlens commented 1 year ago

The following program contains a call to printf in a function f1 with the special calling convention ghccc (it has no callee-save registers).

target triple = "aarch64-unknown-linux-gnu"
; target triple = "x86_64-unknown-linux-gnu"

@.str = private unnamed_addr constant [6 x i8] c"test\0A\00", align 1

; Function Attrs: noinline nounwind uwtable
define dso_local ghccc void @f1() #0 {
entry:
  %call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str)
  ret void
}

; Function Attrs: nofree nounwind
declare noundef i32 @printf(ptr nocapture noundef readonly) local_unnamed_addr #2

; Function Attrs: nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #3 {
entry:
  call ghccc void @f1()
  %call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str)
  ret i32 0
}

By its semantics, the expected output should be like:

test
test

Actually, using the latest version of LLVM, the program compiled on AArch64 will output test\n and then get into an infinite loop.

This is due to LLVM producing the following problematic assembly. The x30 register hasn't been saved before the inner call to printf.

f1:                                     // @f1
        adrp    x0, .L.str
        add     x0, x0, :lo12:.L.str
        bl      printf
        ret
main:                                   // @main
        stp     d15, d14, [sp, #-160]!          // 16-byte Folded Spill
        stp     d13, d12, [sp, #16]             // 16-byte Folded Spill
        stp     d11, d10, [sp, #32]             // 16-byte Folded Spill
        stp     d9, d8, [sp, #48]               // 16-byte Folded Spill
        stp     x29, x30, [sp, #64]             // 16-byte Folded Spill
        stp     x28, x27, [sp, #80]             // 16-byte Folded Spill
        stp     x26, x25, [sp, #96]             // 16-byte Folded Spill
        stp     x24, x23, [sp, #112]            // 16-byte Folded Spill
        stp     x22, x21, [sp, #128]            // 16-byte Folded Spill
        stp     x20, x19, [sp, #144]            // 16-byte Folded Spill
        bl      f1
        adrp    x0, .L.str
        add     x0, x0, :lo12:.L.str
        bl      printf
        ldp     x20, x19, [sp, #144]            // 16-byte Folded Reload
        mov     w0, wzr
        ldp     x22, x21, [sp, #128]            // 16-byte Folded Reload
        ldp     x24, x23, [sp, #112]            // 16-byte Folded Reload
        ldp     x26, x25, [sp, #96]             // 16-byte Folded Reload
        ldp     x28, x27, [sp, #80]             // 16-byte Folded Reload
        ldp     x29, x30, [sp, #64]             // 16-byte Folded Reload
        ldp     d9, d8, [sp, #48]               // 16-byte Folded Reload
        ldp     d11, d10, [sp, #32]             // 16-byte Folded Reload
        ldp     d13, d12, [sp, #16]             // 16-byte Folded Reload
        ldp     d15, d14, [sp], #160            // 16-byte Folded Reload
        ret
.L.str:
        .asciz  "test\n"

I think it's necessary to emit proper code to save the return address register in this case. A possible background is that we may use a special calling convention without callee-save registers to facilitate the speed of hot paths, but sometimes we need to go into a cold path with the usual calling convention.

By the way, for target x86-64, the behavior of the generated program is correct.

llvmbot commented 1 year ago

@llvm/issue-subscribers-backend-aarch64

Author: Waterlens (waterlens)

The following program contains a call to `printf` in a function `f1` with the special calling convention `ghccc` (it has no callee-save registers). ```llvm target triple = "aarch64-unknown-linux-gnu" ; target triple = "x86_64-unknown-linux-gnu" @.str = private unnamed_addr constant [6 x i8] c"test\0A\00", align 1 ; Function Attrs: noinline nounwind uwtable define dso_local ghccc void @f1() #0 { entry: %call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str) ret void } ; Function Attrs: nofree nounwind declare noundef i32 @printf(ptr nocapture noundef readonly) local_unnamed_addr #2 ; Function Attrs: nounwind uwtable define dso_local i32 @main() local_unnamed_addr #3 { entry: call ghccc void @f1() %call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str) ret i32 0 } ``` By its semantics, the expected output should be like: ``` test test ``` Actually, using the latest version of LLVM, the program compiled on AArch64 will output `test\n` and then get into an infinite loop. This is due to LLVM producing the following problematic assembly. The `x30` register hasn't been saved before the inner call to `printf`. ``` f1: // @f1 adrp x0, .L.str add x0, x0, :lo12:.L.str bl printf ret main: // @main stp d15, d14, [sp, #-160]! // 16-byte Folded Spill stp d13, d12, [sp, #16] // 16-byte Folded Spill stp d11, d10, [sp, #32] // 16-byte Folded Spill stp d9, d8, [sp, #48] // 16-byte Folded Spill stp x29, x30, [sp, #64] // 16-byte Folded Spill stp x28, x27, [sp, #80] // 16-byte Folded Spill stp x26, x25, [sp, #96] // 16-byte Folded Spill stp x24, x23, [sp, #112] // 16-byte Folded Spill stp x22, x21, [sp, #128] // 16-byte Folded Spill stp x20, x19, [sp, #144] // 16-byte Folded Spill bl f1 adrp x0, .L.str add x0, x0, :lo12:.L.str bl printf ldp x20, x19, [sp, #144] // 16-byte Folded Reload mov w0, wzr ldp x22, x21, [sp, #128] // 16-byte Folded Reload ldp x24, x23, [sp, #112] // 16-byte Folded Reload ldp x26, x25, [sp, #96] // 16-byte Folded Reload ldp x28, x27, [sp, #80] // 16-byte Folded Reload ldp x29, x30, [sp, #64] // 16-byte Folded Reload ldp d9, d8, [sp, #48] // 16-byte Folded Reload ldp d11, d10, [sp, #32] // 16-byte Folded Reload ldp d13, d12, [sp, #16] // 16-byte Folded Reload ldp d15, d14, [sp], #160 // 16-byte Folded Reload ret .L.str: .asciz "test\n" ``` Though AArch64 doesn't have a concept of caller-save registers, I think it's necessary to emit proper code to save the return value register in this case. A possible background is that we may use a special calling convention without callee-save registers to facilitate the speed of hot paths, but sometimes we need to go into a cold path with the usual calling convention. By the way, for target x86-64, the behavior of the generated program is correct.