Open honggyukim opened 3 years ago
It might be helpful to compare the binaries for both compilers.
$ gcc -pg -O1 -o float.gcc float.c
$ clang -pg -O1 -o float.clang float.c
$ clang -pg -O0 -o float.O0.clang float.c
Here is the float_add
function of gcc compiled binary(float.gcc
).
00000000004006a8 <float_add>:
4006a8: a9be7bfd stp x29, x30, [sp, #-32]!
4006ac: 910003fd mov x29, sp
4006b0: 6d0127e8 stp d8, d9, [sp, #16]
4006b4: 1e204008 fmov s8, s0
4006b8: 1e204029 fmov s9, s1
4006bc: d50320ff xpaclri
4006c0: aa1e03e0 mov x0, x30
4006c4: 97ffff97 bl 400520 <_mcount@plt>
4006c8: 1e292908 fadd s8, s8, s9
4006cc: 90000100 adrp x0, 420000 <__cxa_atexit@GLIBC_2.17>
4006d0: bd003408 str s8, [x0, #52]
4006d4: 6d4127e8 ldp d8, d9, [sp, #16]
4006d8: a8c27bfd ldp x29, x30, [sp], #32
4006dc: d65f03c0 ret
Here is the clang compiled float_add
binary(float.clang
), which gets segfault.
00000000004006a8 <float_add>:
4006a8: 6dbe23e9 stp d9, d8, [sp, #-32]!
4006ac: a9017bfd stp x29, x30, [sp, #16]
4006b0: 910003fd mov x29, sp
4006b4: 4ea11c28 mov v8.16b, v1.16b
4006b8: 4ea01c09 mov v9.16b, v0.16b
4006bc: 97ffff99 bl 400520 <_mcount@plt>
4006c0: a9417bfd ldp x29, x30, [sp, #16]
4006c4: 1e282920 fadd s0, s9, s8
4006c8: 90000108 adrp x8, 420000 <__cxa_atexit@GLIBC_2.17>
4006cc: bd003500 str s0, [x8, #52]
4006d0: 6cc223e9 ldp d9, d8, [sp], #32
4006d4: d65f03c0 ret
The clang compiled float_add
without optimization(float.O0.clang
) is as follows.
00000000004006a8 <float_add>:
4006a8: d10083ff sub sp, sp, #0x20
4006ac: a9017bfd stp x29, x30, [sp, #16]
4006b0: 910043fd add x29, sp, #0x10
4006b4: bd0007e0 str s0, [sp, #4]
4006b8: bd0003e1 str s1, [sp]
4006bc: 97ffff99 bl 400520 <_mcount@plt>
4006c0: bd4007e0 ldr s0, [sp, #4]
4006c4: bc1fc3a0 stur s0, [x29, #-4]
4006c8: bd4003e1 ldr s1, [sp]
4006cc: bd000be1 str s1, [sp, #8]
4006d0: bc5fc3a2 ldur s2, [x29, #-4]
4006d4: bd400be3 ldr s3, [sp, #8]
4006d8: 1e232842 fadd s2, s2, s3
4006dc: 90000108 adrp x8, 420000 <__cxa_atexit@GLIBC_2.17>
4006e0: bd003502 str s2, [x8, #52]
4006e4: a9417bfd ldp x29, x30, [sp, #16]
4006e8: 910083ff add sp, sp, #0x20
4006ec: d65f03c0 ret
As I observed this more, it seems that uftrace can't capture the function exit of float_add
in clang compiled binary.
The test environment is as follows.
$ uftrace info
# system information
# ==================
# program version : v0.10-16-ge532 ( aarch64 dwarf python luajit tui perf sched dynamic )
# recorded on : Sat Jun 26 18:53:49 2021
# cmdline : uftrace record --no-sched a.out
# cpu info : ARM64 (v8)
# number of cpus : 2 / 2 (online / possible)
# memory info : 0.5 / 1.8 GB (free / total)
# system load : 1.13 / 1.12 / 1.09 (1 / 5 / 15 min)
# kernel version : Linux 5.10.21-200.fc33.aarch64
# hostname : fedora
# distro : "Fedora 33 (Workstation Edition)"
#
# process information
# ===================
# number of tasks : 1
# task list : 1256929(a.out)
# exe image : /home/parallels/work/uftrace/tests/a.out
# build id : e7ca28ead08be9684c9204081277928d21186878
# pattern : regex
# exit status : terminated by signal: 11 (Segmentation fault)
# elapsed time : 0.085246028 sec
# cpu time : 0.035 / 0.000 sec (sys / user)
# context switch : 5 / 34 (voluntary / involuntary)
# max rss : 13348 KB
# page fault : 1 / 1216 (major / minor)
# disk iops : 0 / 16 (read / write)
Here is the clang version info.
$ clang --version
clang version 11.0.0 (Fedora 11.0.0-3.fc33)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
We can compare the assembly output at https://godbolt.org/z/jT7G9fPsP.
I'm just wondering what the following instructions are in the broken binary.
4006b4: 4ea11c28 mov v8.16b, v1.16b
4006b8: 4ea01c09 mov v9.16b, v0.16b
After digging into the problem with gdb, I see that the last br x16
goes to NULL
when returning from main
function.
ENTRY(mcount_return)
/* setup frame pointer */
stp x29, x30, [sp, #-16]!
/* save return values */
stp x0, x1, [sp, #-16]!
str q0, [sp, #-16]!
/*
* save indirect result location register
* used in C++ for returning non-trivial objects
*/
stp x8, x18, [sp, #-16]!
add x0, sp, #32
bl mcount_exit
mov x16, x0 /* FIXME: here x16 becomes 0 */
/* restore indirect result location register */
ldp x8, x18, [sp], #16
/* restore return values */
ldr q0, [sp], #16
ldp x0, x1, [sp], #16
/* restore frame pointer */
ldp x29, x30, [sp], #16
br x16
END(mcount_return)
I suspect it's a clang's fault to set the frame pointer to an invalid value. In the _mcount
, we expect x29 points to a stack frame containing 'x29' and 'x30' of the previous frame. But in the float.clang binary, it's saved in x29 + 0x10 so uftrace will use an invalid value (in d8 probably) as a parent location and it'll lead to a crash.
Here is an example.
We can compile this with
clang
.Then we can trace it with uftrace.
But if this program is compiled with optimization then it crashes.
It doesn't have this problem when compiled by
gcc
with the same optimization.