sipwise / rtpengine

The Sipwise media proxy for Kamailio
GNU General Public License v3.0
763 stars 360 forks source link

Geting SIGFPE when trying restore calls from redis with enabled: "poller-per-thread" #1801

Closed iamhalje closed 4 months ago

iamhalje commented 4 months ago

rtpengine version the issue has been seen with

11.3.0.0+0~mr11.3.0.0-1

Used distribution and its version

Red Hat Enterprise Linux release 9.3 (Plow)

Linux kernel version used

5.15.0-203.146.5.1.el9uek.x86_64

CPU architecture issue was seen on (see uname -m)

x86_64

Expected behaviour you didn't see

After restarting the rtpengine systemd service, i do not receive restored calls from Redis.

Unexpected behaviour you saw

~# systemctl status rtpengine
● rtpengine.service - NGCP RTP/media Proxy Daemon
     Loaded: loaded (/usr/lib/systemd/system/rtpengine.service; enabled; preset: disabled)
     Active: activating (auto-restart) (Result: signal) since Sun 2024-03-03 20:55:32 +05; 651ms ago
    Process: 576593 ExecStartPre=/usr/sbin/ngcp-rtpengine-iptables-setup start (code=exited, status=0/SUCCESS)
    Process: 576615 ExecStart=/usr/bin/rtpengine --config-file=${CFG_FILE} --pidfile=${PID_FILE} (code=exited, status=0/SUCCESS)
    Process: 576633 ExecStopPost=/usr/sbin/ngcp-rtpengine-iptables-setup stop (code=exited, status=0/SUCCESS)
   Main PID: 576618 (code=killed, signal=FPE)
        CPU: 242ms

Steps to reproduce the problem

Configure rtpengine config like this:

redis = 1.1.1.13:6380/2
redis-write = rtpengine@2.2.2.2:6379/2
subscribe-keyspace = 1
redis-num-threads = 4
no-redis-required = false
redis-expires = 7200
redis-allowed-errors = -1
redis-disable-time = 10
redis-cmd-timeout = 0
redis-connect-timeout = 10000

dtmf-log-dest = 2.2.2.2:3333
poller-per-thread = true

If we call with this config, and without even pressing DTMF, after rebooting the systemd service we receive a FPE signal.

Additional program output to the terminal or logs illustrating the issue

~# coredumpctl info 1158477
           PID: 1158477 (rtpengine)
           UID: 979 (ngcp-rtpengine)
           GID: 979 (ngcp-rtpengine)
        Signal: 8 (FPE)
     Timestamp: Sun 2024-03-03 21:11:51 +05 (16s ago)
  Command Line: /usr/bin/rtpengine --config-file=/etc/rtpengine/rtpengine.conf --pidfile=/run/rtpengine/rtpengine.pid
    Executable: /usr/bin/rtpengine
 Control Group: /system.slice/rtpengine.service
          Unit: rtpengine.service
         Slice: system.slice
       Boot ID: 8d684d6c4c7f441a9273b52fe94bd9fc
    Machine ID: 0efcc658c8dd4ecea927537fd1ade170
      Hostname: rtpengine
       Storage: /var/lib/systemd/coredump/core.rtpengine.979.8d684d6c4c7f441a9273b52fe94bd9fc.1158477.1709482311000000.zst (present)
  Size on Disk: 10.3M
       Message: Process 1158477 (rtpengine) of user 979 dumped core.

                Stack trace of thread 1158485:
                #0  0x000000000042793d poller_map_get (rtpengine + 0x2793d)
                #1  0x0000000000505284 stream_fd_new (rtpengine + 0x105284)
                #2  0x0000000000464f09 redis_sfds (rtpengine + 0x64f09)
                #3  0x000000000046a133 json_restore_call (rtpengine + 0x6a133)
                #4  0x000000000046af7e restore_thread (rtpengine + 0x6af7e)
                #5  0x00007fbdb41e94d4 g_thread_pool_thread_proxy.lto_priv.0 (libglib-2.0.so.0 + 0x864d4)
                #6  0x00007fbdb41e65e2 g_thread_proxy (libglib-2.0.so.0 + 0x835e2)
                #7  0x00007fbdb1776812 start_thread (libc.so.6 + 0x9f812)
                #8  0x00007fbdb1716450 __clone3 (libc.so.6 + 0x3f450)

                Stack trace of thread 1158477:
                #0  0x00007fbdb1715e5d syscall (libc.so.6 + 0x3ee5d)
                #1  0x00007fbdb42065c3 g_cond_wait (libglib-2.0.so.0 + 0xa35c3)
                #2  0x00007fbdb41e97eb g_thread_pool_free (libglib-2.0.so.0 + 0x867eb)
                #3  0x000000000046b4c5 redis_restore (rtpengine + 0x6b4c5)
                #4  0x0000000000424b16 do_redis_restore (rtpengine + 0x24b16)
                #5  0x000000000042546e main (rtpengine + 0x2546e)
                #6  0x00007fbdb1716eb0 __libc_start_call_main (libc.so.6 + 0x3feb0)
                #7  0x00007fbdb1716f60 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x3ff60)
                #8  0x0000000000416745 _start (rtpengine + 0x16745)

                Stack trace of thread 1158478:
                #0  0x00007fbdb172caca __sigtimedwait (libc.so.6 + 0x55aca)
                #1  0x00000000004168c1 sighandler (rtpengine + 0x168c1)
                #2  0x000000000042a7b5 thread_detach_func (rtpengine + 0x2a7b5)
                #3  0x00007fbdb1776812 start_thread (libc.so.6 + 0x9f812)
                #4  0x00007fbdb1716450 __clone3 (libc.so.6 + 0x3f450)

                Stack trace of thread 1158479:
                #0  0x00007fbdb17ea9f5 clock_nanosleep@GLIBC_2.2.5 (libc.so.6 + 0x1139f5)
                #1  0x00007fbdb17ef5a7 __nanosleep (libc.so.6 + 0x1185a7)
                #2  0x00007fbdb181c679 usleep (libc.so.6 + 0x145679)
                #3  0x00000000004297f5 poller_timer_loop (rtpengine + 0x297f5)
                #4  0x000000000042a7b5 thread_detach_func (rtpengine + 0x2a7b5)
                #5  0x00007fbdb1776812 start_thread (libc.so.6 + 0x9f812)
                #6  0x00007fbdb1716450 __clone3 (libc.so.6 + 0x3f450)

                Stack trace of thread 1158481:
                #0  0x00007fbdb18258be epoll_wait (libc.so.6 + 0x14e8be)
                #1  0x00007fbdb3bc1a8c epoll_dispatch.lto_priv.0 (libevent-2.1.so.7 + 0x2ea8c)
                #2  0x00007fbdb3bb8fc1 event_base_loop (libevent-2.1.so.7 + 0x25fc1)
                #3  0x0000000000460e12 redis_notify (rtpengine + 0x60e12)
                #4  0x000000000046166d redis_notify_loop (rtpengine + 0x6166d)
                #5  0x000000000042a7b5 thread_detach_func (rtpengine + 0x2a7b5)
                #6  0x00007fbdb1776812 start_thread (libc.so.6 + 0x9f812)
                #7  0x00007fbdb1716450 __clone3 (libc.so.6 + 0x3f450)
                ELF object binary architecture: AMD x86-64

Anything else?

~# gdb /usr/bin/rtpengine /var/lib/systemd/coredump/core.rtpengine.979.8d684d6c4c7f441a9273b52fe94bd9fc.1158477.1709482311000000
Reading symbols from /usr/bin/rtpengine...
Reading symbols from /usr/lib/debug/usr/bin/rtpengine-11.3.0.0+0~mr11.3.0.0-1.el9.x86_64.debug...
[New LWP 1158485]
[New LWP 1158477]
[New LWP 1158478]
[New LWP 1158479]
[New LWP 1158481]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/bin/rtpengine --config-file=/etc/rtpengine/rtpengine.conf --pidfile=/run/r'.
Program terminated with signal SIGFPE, Arithmetic exception.
#0  0x000000000042793d in poller_map_get (map=0x14070c0) at poller.c:93
93                      p = g_hash_table_lookup(map->table, arr[ssl_random() % g_hash_table_size(map->table)]);