namhyung / uftrace

Function graph tracer for C/C++/Rust/Python
https://uftrace.github.io/slide/
GNU General Public License v2.0
3.01k stars 441 forks source link

WARN: Segmentation fault: address not mapped in llama.cpp #1949

Open honggyukim opened 3 weeks ago

honggyukim commented 3 weeks ago

The llama.cpp project already has an option to add -pg option with LLAMA_GPROF=1.

But it gets crashed when llama-cli is traced with uftrace as follows.

$ git clone https://github.com/ggerganov/llama.cpp.git && cd llama.cpp

$ git checkout 2339a0be1c8e31fcf4531427183b94f2ef019e56

$ make LLAMA_GPROF=1 -j

$ uftrace record --no-libcall ./llama-cli 
Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723990922
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-12-g8382 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [8] (llama_load_model_from_file[5a2eea7cfda4] <= llama_init_from_gpt_params[5a2eea86b78e])
WARN: [7] (llama_init_from_gpt_params[5a2eea86b700] <= main[5a2eea6d4392])
WARN: [6] (log_dump_cmdline_impl[5a2eea8d012d] <= main[5a2eea6d3e46])
WARN: [5] (main[5a2eea6d3bec] <= __libc_start_call_main[7aad7802a1ca])
WARN: [4] (__static_initialization_and_destruction_0[5a2eea6d28a9] <= mcount_return[7aad788657ef])
WARN: [3] (_GLOBAL__sub_I_json_schema_to_grammar.cpp[5a2eea6d3bae] <= __libc_start_main[7aad7802a304])
WARN: [2] (_GLOBAL__sub_I_console.cpp[5a2eea6d268e] <= __libc_start_main[7aad7802a304])
WARN: [1] (__static_initialization_and_destruction_0[5a2eea6d2402] <= mcount_return[7aad788657ef])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[5a2eea6d266e] <= __libc_start_main[7aad7802a304])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault

It still gets crashed even with -e option.

$ uftrace record --no-libcall -e ./llama-cli 
Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723990966
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN: Backtrace from uftrace v0.16-12-g8382 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [6] (llama_load_model_from_file[5bc0d1b9ada4] <= llama_init_from_gpt_params[5bc0d1c3678e])
WARN: [5] (llama_model_default_params[5bc0d1b41bae] <= llama_model_params_from_gpt_params[5bc0d1c306af])
WARN: [4] (llama_model_params_from_gpt_params[5bc0d1c30695] <= llama_init_from_gpt_params[5bc0d1c36737])
WARN: [3] (llama_init_from_gpt_params[5bc0d1c36700] <= main[5bc0d1a9f392])
WARN: [2] (main[5bc0d1a9ebec] <= __libc_start_call_main[70fd6b82a1ca])
WARN: [1] (__static_initialization_and_destruction_0[5bc0d1a9d8a9] <= __libc_start_main[70fd6b82a304])
WARN: [0] (_GLOBAL__sub_I_json_schema_to_grammar.cpp[5bc0d1a9ebae] <= __libc_start_main[70fd6b82a304])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
honggyukim commented 3 weeks ago

Here is the backtrace.

$ gdb -q --args uftrace record --no-libcall -e ./llama-cli
Reading symbols from uftrace...
(gdb) r

Log start
main: build = 3602 (82d8e7f3)
main: built with cc (Ubuntu 13.2.0-23ubuntu4) 13.2.0 for x86_64-linux-gnu
main: seed  = 1723991141

Thread 2.1 "llama-cli" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7e868c0 (LWP 27737)]
0x000055555568bdc2 in llama_load_model_from_file (path_model=0x5555558c06e0 "models/7B/ggml-model-f16.gguf", params=...) at src/llama.cpp:16936
16936           struct llama_model_params   params) {
(gdb) bt
#0  0x000055555568bdc2 in llama_load_model_from_file (path_model=0x5555558c06e0 "models/7B/ggml-model-f16.gguf", params=...) at src/llama.cpp:16936
#1  0x000055555572778e in llama_init_from_gpt_params (params=...) at common/common.cpp:2107
#2  0x0000555555590392 in main (argc=<optimized out>, argv=<optimized out>) at examples/main/main.cpp:210
namhyung commented 2 weeks ago

Looks like a memory corruption on the stack. I've added a debug message and the problem was gone. :(

namhyung commented 2 weeks ago

It seems the llama_load_model_from_file passes a large struct llama_model_param by value and it's overwritten somewhere..

namhyung commented 2 weeks ago

I think I found the reason.. Strangely, it seems the llama binary didn't followed the x86_64 calling convention and used r10 register across a function call. Here's the disassembly of the llama_load_model_from_file:

(gdb) disas
Dump of assembler code for function llama_load_model_from_file(char const*, llama_model_params):
Address range 0x555555686f80 to 0x555555688d7a:
   0x0000555555686f80 <+0>: push   %rbp
   0x0000555555686f81 <+1>: mov    %rsp,%rbp
   0x0000555555686f84 <+4>: push   %r15
   0x0000555555686f86 <+6>: push   %r14
   0x0000555555686f88 <+8>: push   %r13
   0x0000555555686f8a <+10>:    push   %r12
   0x0000555555686f8c <+12>:    push   %r10
   0x0000555555686f8e <+14>:    lea    0x10(%rbp),%r10
   0x0000555555686f92 <+18>:    push   %rbx
   0x0000555555686f93 <+19>:    sub    $0x1e0,%rsp
   0x0000555555686f9a <+26>:    call   *0x16dd28(%rip)        # 0x5555557f4cc8   (mcount)
   0x0000555555686fa0 <+32>:    mov    0x20(%r10),%r15
   0x0000555555686fa4 <+36>:    mov    0x28(%r10),%rax
   0x0000555555686fa8 <+40>:    mov    %r10,-0x1e0(%rbp)
   0x0000555555686faf <+47>:    mov    %rdi,-0x1d8(%rbp)
   0x0000555555686fb6 <+54>:    mov    %rax,-0x208(%rbp)
   0x0000555555686fbd <+61>:    mov    %r15,-0x200(%rbp)
   0x0000555555686fc4 <+68>:    call   0x5555555ad010 <ggml_time_init>
  ...

The r10 register was set at offset 14, but used after calling mcount at offset 32 and 36. The value of r10 was overwritten during mcount because it's a temporary register. I think compiler should generate code to save r10 before offset 26. Probably this is a bug in the compiler but we cannot simply blame it and done with it.

The following patch seems to work well for -e at least. And I found I did the same for the dynamic tracing already. :)

diff --git a/arch/x86_64/mcount.S b/arch/x86_64/mcount.S
index dd32b0a6..8f0f07e0 100644
--- a/arch/x86_64/mcount.S
+++ b/arch/x86_64/mcount.S
@@ -57,9 +57,13 @@ GLOBAL(mcount)

        /* save rax (implicit argument for variadic functions) */
        push %rax
+       push %r10
+       push %r11

        call mcount_entry

+       pop  %r11
+       pop  %r10
        pop  %rax

        /* restore original stack pointer */
honggyukim commented 2 weeks ago

Thanks for the investigation.

I see it works fine with -e, but still gets crashed even with your fix about r10 and r11 when -e is unused.

$ uftrace record --no-libcall ./llama-cli
Log start
main: build = 3601 (2339a0be)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1724667048
gguf_init_from_file: failed to open 'models/7B/ggml-model-f16.gguf': 'No such file or directory'
WARN: process crashed by signal 11: Segmentation fault (si_code: 128)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-15-gf8a7 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [8] (gguf_init_from_file[56071b6ece4d] <= llama_model_loader::llama_model_loader[56071b7cf28d])
WARN: [7] (llama_model_loader::llama_model_loader[56071b7cee08] <= llama_load_model_from_file[56071b794731])
WARN: [6] (llama_load_model_from_file[56071b794354] <= llama_init_from_gpt_params[56071b829c2c])
WARN: [5] (llama_init_from_gpt_params[56071b829b90] <= main[56071b689904])
WARN: [4] (log_dump_cmdline_impl[56071b88c43d] <= main[56071b6893a5])
WARN: [3] (main[56071b68910a] <= __libc_start_call_main[7fc4d2a29d90])
WARN: [2] (_GLOBAL__sub_I_common.cpp[56071b687fd4] <= __libc_start_main[7fc4d2a29ebb])
WARN: [1] (__static_initialization_and_destruction_0[56071b687d32] <= mcount_return[7fc4d31b6007])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[56071b687fae] <= __libc_start_main[7fc4d2a29ebb])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault 
namhyung commented 2 weeks ago

How about this too?

 ENTRY(mcount_return)
        .cfi_startproc
-       sub $48, %rsp
-       .cfi_def_cfa_offset 48
-
-       movq   %rdi,  32(%rsp)
+       sub $96, %rsp
+       .cfi_def_cfa_offset 96
+
+       /* save all caller-saved registers due to -fipa-ra */
+       movq   %r11,  80(%rsp)
+       movq   %r10,  72(%rsp)
+       movq   %r9,   64(%rsp)
+       movq   %r8,   56(%rsp)
+       movq   %rdi,  48(%rsp)
+       movq   %rsi,  40(%rsp)
+       movq   %rcx,  32(%rsp)
+
+       /* below are used to carry return value */
        movdqu %xmm0, 16(%rsp)
        movq   %rdx,   8(%rsp)
        movq   %rax,   0(%rsp)
@@ -113,14 +129,21 @@ ENTRY(mcount_return)
        movq    0(%rsp), %rsp

        /* restore original return address in parent */
-       movq    %rax, 40(%rsp)
+       movq    %rax, 88(%rsp)

        movq    0(%rsp), %rax
        movq    8(%rsp), %rdx
        movdqu 16(%rsp), %xmm0
-       movq   32(%rsp), %rdi

-       add $40, %rsp
+       movq   32(%rsp), %rcx
+       movq   40(%rsp), %rsi
+       movq   48(%rsp), %rdi
+       movq   56(%rsp), %r8
+       movq   64(%rsp), %r9
+       movq   72(%rsp), %r10
+       movq   80(%rsp), %r11
+
+       add    $88, %rsp
        .cfi_def_cfa_offset 8
        retq
        .cfi_endproc
honggyukim commented 2 weeks ago

Thanks but still not working.

$ uftrace record --no-libcall ./llama-cli
Log start
main: build = 3601 (2339a0be)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1724738400
WARN: Segmentation fault: address not mapped (addr: 0x20)
WARN:  if this happens only with uftrace, please consider -e/--estimate-return option.

WARN: Backtrace from uftrace v0.16-17-g06b3a13 ( x86_64 dwarf python3 luajit tui perf sched dynamic kernel )
WARN: =====================================
WARN: [6] (llama_load_model_from_file[55fbf4d99354] <= llama_init_from_gpt_params[55fbf4e2ec2c])
WARN: [5] (llama_init_from_gpt_params[55fbf4e2eb90] <= main[55fbf4c8e904])
WARN: [4] (log_dump_cmdline_impl[55fbf4e9143d] <= main[55fbf4c8e3a5])
WARN: [3] (main[55fbf4c8e10a] <= __libc_start_call_main[7fe7fca29d90])
WARN: [2] (_GLOBAL__sub_I_common.cpp[55fbf4c8cfd4] <= __libc_start_main[7fe7fca29ebb])
WARN: [1] (__static_initialization_and_destruction_0[55fbf4c8cd32] <= mcount_return[7fe7fd1cafff])
WARN: [0] (_GLOBAL__sub_I_unicode_data.cpp[55fbf4c8cfae] <= __libc_start_main[7fe7fca29ebb])

Please report this bug to https://github.com/namhyung/uftrace/issues.

WARN: child terminated by signal: 11: Segmentation fault
namhyung commented 2 weeks ago

For some reason, I cannot reproduce it within gdb.

honggyukim commented 1 week ago

That's weird. Do you mean you cannot reproduce it only when gdb is used?

namhyung commented 1 week ago

Yep, I can reproduce it without gdb. Anyway, I think we can merge what we found so far.

honggyukim commented 1 week ago

Hmm.. the second change at https://github.com/namhyung/uftrace/issues/1949#issuecomment-2310583252 doesn't fix anything. It fails even with and without -e option while the first change makes it work with -e at least.

namhyung commented 1 week ago

Yeah but I think it should be added together.