nginx / unit

NGINX Unit - universal web app server - a lightweight and versatile open source server that simplifies the application stack by natively executing application code across eight different programming language runtimes.
https://unit.nginx.org
Apache License 2.0
5.4k stars 328 forks source link

NGINX_Unit: [BUG] Segmentation fault at 0x0000000000000040 #919

Open vt-alt opened 1 year ago

vt-alt commented 1 year ago

Unit crashes with ruby module (on ALT Linux). unit.log:

NGINX_Unit: [BUG] Segmentation fault at 0x0000000000000040
ruby 3.1.2p20 (2022-04-12 revision a21a3b7d23) [x86_64-linux]
...
-- C level backtrace information -------------------------------------------
2023/08/13 17:48:13 [notice] 5189#5189 process 5193 exited with code 0
/usr/lib64/libruby.so.3.1(rb_vm_bugreport+0x508) [0x7f5d7885e478]
/usr/lib64/libruby.so.3.1(0xaf8f0) [0x7f5d786af8f0]
/usr/lib64/libruby.so.3.1(0x1d2d6b) [0x7f5d787d2d6b]
/lib64/libc.so.6(__restore_rt+0x0) [0x7f5d78a57dd0]
/usr/lib64/libruby.so.3.1(0xb3ba1) [0x7f5d786b3ba1]
/usr/lib64/libruby.so.3.1(rb_ensure+0x26b) [0x7f5d786b6f6b]
/usr/lib64/libruby.so.3.1(0x204a73) [0x7f5d78804a73]
/usr/lib64/libruby.so.3.1(rb_nogvl+0x2a6) [0x7f5d7880ec66]
/usr/lib64/unit/modules/ruby.unit.so(nxt_ruby_start+0xeaa) [0x7f5d7912545a]
unit: "debuginfod" application(0x1dfcb) [0x5589d2c0bfcb]
unit: "debuginfod" application(0x1f0d7) [0x5589d2c0d0d7]
unit: "debuginfod" application(0x1d474) [0x5589d2c0b474]
unit: "debuginfod" application(0x1b5ce) [0x5589d2c095ce]
unit: "debuginfod" application(0x1c6a3) [0x5589d2c0a6a3]
unit: "debuginfod" application(nxt_event_engine_start+0x8c) [0x5589d2c0f63c]
unit: "debuginfod" application(0x11a85) [0x5589d2bffa85]
/lib64/libc.so.6(0x2378c) [0x7f5d78a4278c]
/lib64/libc.so.6(__libc_start_main+0x85) [0x7f5d78a42845]
unit: "debuginfod" application(_start+0x21) [0x5589d2bffce1]

-- Other runtime information -----------------------------------------------

(I may attach full log if you request.) I think no ruby script should do that.

# coredumpctl debug
           PID: 5195 (unitd)
           UID: 483 (_cgi)
           GID: 500 (indexer)
        Signal: 6 (ABRT)
     Timestamp: Sun 2023-08-13 17:48:13 MSK (1min 50s ago)
  Command Line: $'unit: "debuginfod" application'
    Executable: /usr/sbin/unitd
 Control Group: /system.slice/unit.service
          Unit: unit.service
         Slice: system.slice
       Boot ID: 0cb047777a674705a3b4d33b9d5f57a3
    Machine ID: 95a7ac09c4274a82b2269c548b58c63b
      Hostname: debuginfo.mskdc.altlinux.org
       Storage: /var/lib/systemd/coredump/core.unitd.483.0cb047777a674705a3b4d33b9d5f57a3.5195.1691938093000000.zst (present)
  Size on Disk: 68.6M
       Message: Process 5195 (unitd) of user 483 dumped core.

GNU gdb (GDB) 11.2-alt1 (ALT Sisyphus)
...
Reading symbols from /usr/sbin/unitd...
Reading symbols from /usr/lib/debug/usr/sbin/unitd.debug...
...
Core was generated by `unit: "debuginfod" applicati'.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
[Current thread is 1 (Thread 0x7f5d7912e940 (LWP 5195))]

(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007f5d78aa563f in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2  0x00007f5d78a57d32 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007f5d78a414af in __GI_abort () at abort.c:79
#4  0x00007f5d78638872 in die () at error.c:784
#5  rb_bug_for_fatal_signal (default_sighandler=<optimized out>, sig=<optimized out>, sig@entry=11, ctx=<optimized out>, ctx@entry=0x5589d4a23200,
    fmt=<optimized out>, fmt@entry=0x7f5d788c079d "Segmentation fault at %p") at error.c:825
#6  0x00007f5d787d2d6b in sigsegv (sig=11, info=0x5589d4a23330, ctx=0x5589d4a23200) at signal.c:964
#7  <signal handler called>
#8  0x00007f5d786b3ba1 in rb_ec_tag_jump (st=st@entry=RUBY_TAG_RAISE, ec=0x5589d498b9f0) at /usr/src/debug/ruby-3.1.2/eval_intern.h:157
#9  0x00007f5d786b6f6b in rb_ensure (b_proc=b_proc@entry=0x7f5d7880e1b0 <thread_join_sleep>, data1=data1@entry=140724026465568,
    e_proc=e_proc@entry=0x7f5d78803f90 <remove_from_join_list>, data2=data2@entry=140724026465568) at eval.c:1004
#10 0x00007f5d78804a73 in thread_join (target_th=0x5589d525ae50, timeout=8, limit=0x0) at thread.c:1246
#11 0x00007f5d7880ec66 in thread_value (self=<optimized out>) at thread.c:1366
#12 rb_nogvl (func=<optimized out>, data1=<optimized out>, ubf=<optimized out>, data2=<optimized out>, flags=<optimized out>) at thread.c:1699
#13 0x00007f5d7912545a in nxt_ruby_start (task=<optimized out>, data=<optimized out>) at src/ruby/nxt_ruby.c:382
#14 0x00005589d2c0bfcb in nxt_process_do_start (task=task@entry=0x5589d4963a50, process=process@entry=0x5589d4978300) at src/nxt_process.c:722
#15 0x00005589d2c0d0d7 in nxt_process_whoami_ok (task=0x5589d4963a50, msg=<optimized out>, data=0x5589d4978300) at src/nxt_process.c:850
#16 0x00005589d2c0b474 in nxt_port_rpc_handler (task=0x5589d4963a50, msg=0x7ffcdd9be290) at src/nxt_port_rpc.c:347
#17 0x00005589d2c095ce in nxt_port_read_msg_process (task=task@entry=0x5589d4963a50, port=port@entry=0x5589d49784a0, msg=msg@entry=0x7ffcdd9be290)
    at src/nxt_port_socket.c:1271
#18 0x00005589d2c0a6a3 in nxt_port_read_handler (task=0x5589d4963a50, obj=<optimized out>, data=<optimized out>) at src/nxt_port_socket.c:778
#19 0x00005589d2c0f63c in nxt_event_engine_start (engine=<optimized out>) at src/nxt_event_engine.c:542
#20 0x00005589d2bffa85 in main (argc=<optimized out>, argv=<optimized out>) at src/nxt_main.c:35
(gdb) frame 8
#8  0x00007f5d786b3ba1 in rb_ec_tag_jump (st=st@entry=RUBY_TAG_RAISE, ec=0x5589d498b9f0) at /usr/src/debug/ruby-3.1.2/eval_intern.h:157
157         ec->tag->state = st;
(gdb) list
152
153     NORETURN(static inline void rb_ec_tag_jump(const rb_execution_context_t *ec, enum ruby_tag_type st));
154     static inline void
155     rb_ec_tag_jump(const rb_execution_context_t *ec, enum ruby_tag_type st)
156     {
157         ec->tag->state = st;
158         ruby_longjmp(ec->tag->buf, 1);
159     }
160
161     /*
(gdb) p ec
$1 = (const rb_execution_context_t *) 0x5589d498b9f0
(gdb) p ec->tag
$2 = (struct rb_vm_tag *) 0x0
(gdb) frame 13
#13 0x00007f5d7912545a in nxt_ruby_start (task=<optimized out>, data=<optimized out>) at src/ruby/nxt_ruby.c:382
382         rc = (intptr_t) rb_thread_call_without_gvl2(nxt_ruby_unit_run, unit_ctx,
(gdb) list
377                 nxt_ruby_exception_log(NULL, NXT_LOG_ERR,
378                                        "Failed to call on_thread_boot()");
379             }
380         }
381
382         rc = (intptr_t) rb_thread_call_without_gvl2(nxt_ruby_unit_run, unit_ctx,
383                                                     nxt_ruby_ubf, unit_ctx);
384
385         if (nxt_ruby_hook_procs != Qnil) {
386             rb_protect(nxt_ruby_hook_call, nxt_rb_on_thread_shutdown, &state);

Versions:

# rpm -q unit unit-ruby libruby
unit-1.30.0-alt1.x86_64
unit-ruby-1.30.0-alt1.x86_64
libruby-3.1.2-alt2.1.x86_64
tippexs commented 1 year ago

Thanks for reporting this to Us. Do you have any example code that we can reproduce this issue?

vt-alt commented 1 year ago

This is our experimental debuginfod server (config). But without our dataset I'm not sure you will be able to easily run it. I will try to minify it later when have more time for this. It's run there https://debuginfod.altlinux.org/ after that internal crash some functions work some not.

vt-alt commented 1 year ago

I will try to minify it later when have more time for this.

OK. Even tiny example from https://unit.nginx.org/howto/samples/#ruby crashes:

from app = Proc.new do |env|
    ["200", {
        "Content-Type" => "text/plain",
    }, ["Hello, Ruby on Unit!"]]
end

run app

So we can rule out influence of bundler and rocksdb.

vt-alt commented 1 year ago

On 'minified' system environment this does not crash. I will try to catch the difference. 👷

vt-alt commented 1 year ago

Excuse me, (after much experimenting) I just realized that this message is happening not on unitd start but on unitd termination. Because I was restarting with systemctl restart unit termination and start events are so close in time that I thought stack trace was on the start (and relating to some unitd thread or process while other parts continue to work).

This perhaps still a bug, even though it isn't as Important as I thought before.

It's reproducible on quite bare system (on ALT Linux) with example configs (from docs), with killall unitd (I didn't find other termination method).

If this behavior is intentional please close the issue. But I'd wish there is graceful termination method. 🤷

lcrilly commented 1 year ago

Thanks for isolating this segfault - it gives us something to start with. It might be by design but will investigate.

ac000 commented 1 year ago

Sounds like it might be a long standing issue with various fixes, but not 100% 2bd4a455278dbce756c1dc8be07515f73d611e3b, b26624fc10cc1f4eb88c16ac8fb5b6b4b61895fc, #562

vt-alt commented 1 year ago

Yes these are from 2022 but we have 1.30.0 which is 2023.

ac000 commented 1 year ago

The point is, this is likely a known issue that remains unresolved.

Could you test without threads in your config, just remove or comment out that config line.

vt-alt commented 1 year ago

Just tested it (several times) and it still crashes.

root@debuginfo:~# jobs
[1]+  Running                 tail -f /var/log/unit/unit.log &
root@debuginfo:~# unitc /config/applications
{
  "debuginfod": {
    "type": "ruby",
    "working_directory": "/home/indexer/debuginfer",
    "user": "_cgi",
    "group": "indexer",
    "script": "debuginfod.ru"
  }
}
root@debuginfo:~# killall unitd
2023/08/28 06:02:22 [notice] 3198281#3198281 signal signo:15 (SIGTERM) received
NGINX_Unit: [BUG] Segmentation fault at 0x0000000000000040
ac000 commented 1 year ago

As long as you're not using threads, this patch still prevents the crash

diff --git a/src/ruby/nxt_ruby.c b/src/ruby/nxt_ruby.c
index bcb48f6b..8807edd2 100644
--- a/src/ruby/nxt_ruby.c
+++ b/src/ruby/nxt_ruby.c
@@ -380,7 +380,7 @@ nxt_ruby_start(nxt_task_t *task, nxt_process_data_t *data)
     }

     rc = (intptr_t) rb_thread_call_without_gvl2(nxt_ruby_unit_run, unit_ctx,
-                                                nxt_ruby_ubf, unit_ctx);
+                                                RUBY_UBF_IO, NULL);

     if (nxt_ruby_hook_procs != Qnil) {
         rb_protect(nxt_ruby_hook_call, nxt_rb_on_thread_shutdown, &state);

That works at least with Ruby 3.1.2

I guess something changed in Ruby from when 2bd4a455278dbce756c1dc8be07515f73d611e3b went in that means it no longer works... (although it's now not clear to me what cases that did fix...).