I think it's necessary to emit proper code to save the return address register in this case. A possible background is that we may use a special calling convention without callee-save registers to facilitate the speed of hot paths, but sometimes we need to go into a cold path with the usual calling convention.
By the way, for target x86-64, the behavior of the generated program is correct.
The following program contains a call to `printf` in a function `f1` with the special calling convention `ghccc` (it has no callee-save registers).
```llvm
target triple = "aarch64-unknown-linux-gnu"
; target triple = "x86_64-unknown-linux-gnu"
@.str = private unnamed_addr constant [6 x i8] c"test\0A\00", align 1
; Function Attrs: noinline nounwind uwtable
define dso_local ghccc void @f1() #0 {
entry:
%call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str)
ret void
}
; Function Attrs: nofree nounwind
declare noundef i32 @printf(ptr nocapture noundef readonly) local_unnamed_addr #2
; Function Attrs: nounwind uwtable
define dso_local i32 @main() local_unnamed_addr #3 {
entry:
call ghccc void @f1()
%call = call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str)
ret i32 0
}
```
By its semantics, the expected output should be like:
```
test
test
```
Actually, using the latest version of LLVM, the program compiled on AArch64 will output `test\n` and then get into an infinite loop.
This is due to LLVM producing the following problematic assembly. The `x30` register hasn't been saved before the inner call to `printf`.
```
f1: // @f1
adrp x0, .L.str
add x0, x0, :lo12:.L.str
bl printf
ret
main: // @main
stp d15, d14, [sp, #-160]! // 16-byte Folded Spill
stp d13, d12, [sp, #16] // 16-byte Folded Spill
stp d11, d10, [sp, #32] // 16-byte Folded Spill
stp d9, d8, [sp, #48] // 16-byte Folded Spill
stp x29, x30, [sp, #64] // 16-byte Folded Spill
stp x28, x27, [sp, #80] // 16-byte Folded Spill
stp x26, x25, [sp, #96] // 16-byte Folded Spill
stp x24, x23, [sp, #112] // 16-byte Folded Spill
stp x22, x21, [sp, #128] // 16-byte Folded Spill
stp x20, x19, [sp, #144] // 16-byte Folded Spill
bl f1
adrp x0, .L.str
add x0, x0, :lo12:.L.str
bl printf
ldp x20, x19, [sp, #144] // 16-byte Folded Reload
mov w0, wzr
ldp x22, x21, [sp, #128] // 16-byte Folded Reload
ldp x24, x23, [sp, #112] // 16-byte Folded Reload
ldp x26, x25, [sp, #96] // 16-byte Folded Reload
ldp x28, x27, [sp, #80] // 16-byte Folded Reload
ldp x29, x30, [sp, #64] // 16-byte Folded Reload
ldp d9, d8, [sp, #48] // 16-byte Folded Reload
ldp d11, d10, [sp, #32] // 16-byte Folded Reload
ldp d13, d12, [sp, #16] // 16-byte Folded Reload
ldp d15, d14, [sp], #160 // 16-byte Folded Reload
ret
.L.str:
.asciz "test\n"
```
Though AArch64 doesn't have a concept of caller-save registers, I think it's necessary to emit proper code to save the return value register in this case. A possible background is that we may use a special calling convention without callee-save registers to facilitate the speed of hot paths, but sometimes we need to go into a cold path with the usual calling convention.
By the way, for target x86-64, the behavior of the generated program is correct.
The following program contains a call to
printf
in a functionf1
with the special calling conventionghccc
(it has no callee-save registers).By its semantics, the expected output should be like:
Actually, using the latest version of LLVM, the program compiled on AArch64 will output
test\n
and then get into an infinite loop.This is due to LLVM producing the following problematic assembly. The
x30
register hasn't been saved before the inner call toprintf
.I think it's necessary to emit proper code to save the return address register in this case. A possible background is that we may use a special calling convention without callee-save registers to facilitate the speed of hot paths, but sometimes we need to go into a cold path with the usual calling convention.
By the way, for target x86-64, the behavior of the generated program is correct.