openresty / stapxx

Simple macro language extentions to systemtap
697 stars 201 forks source link

Detection of memory leak in a high-volume system causes reboots #29

Open weldpua2008 opened 7 years ago

weldpua2008 commented 7 years ago

Hello, We are using the following script to generate Memory Leak Flame Graphs on our production servers every 30 minutes:

/usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-gc-objs.sxx -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` -D MAXACTION=200000

/usr/src/stapxx/stap++ /usr/src/stapxx/samples/sample-bt-leaks.sxx  -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` --arg time=5 -D STP_NO_OVERLOAD -D MAXMAPENTRIES=10000 > a.bt
/usr/src/FlameGraph/stackcollapse-stap.pl  a.bt >  a.cbt
/usr/src/FlameGraph/flamegraph.pl --countname=bytes --title="Memory Leak Flame Graph" a.cbt > a.svg
cp a.svg  /code/www/

We also using the following scripts

#every 60 minutes
 /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-lua-stacks.sxx --arg time=5 --skip-badvars -x 6372 > /tmp/result.bt
/usr/src/openresty-systemtap-toolkit/fix-lua-bt /tmp/result.bt > /tmp/result-fix.bt
/usr/src/FlameGraph/stackcollapse-stap.pl /tmp/result-fix.bt > /tmp/result.cbt
/usr/src/FlameGraph/flamegraph.pl --encoding="ISO-8859-1" --title="Lua-land on-CPU for (`hostname`) at `date`" /tmp/result.cbt > /tlvmedia/code/www/result.svg

########
# every 10 minutes 
sudo stdbuf -oL /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-vm-states.sxx -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` --arg time=10 &> /tmp/lua-vm-state; echo 1;

After installation debug symbols ( debuginfo-install glibc ) we are getting randomly reboot

# last reboot
reboot   system boot  2.6.32-642.15.1. Thu May 11 09:51 - 12:17  (02:26)
reboot   system boot  2.6.32-642.15.1. Thu May 11 07:51 - 12:17  (04:26)
reboot   system boot  2.6.32-642.15.1. Thu May 11 07:21 - 12:17  (04:56)
reboot   system boot  2.6.32-642.15.1. Thu May 11 03:21 - 12:17  (08:56)
reboot   system boot  2.6.32-642.15.1. Thu May 11 02:22 - 12:17  (09:54)

We have CentOs 6.8:

# uname -a
Linux s 2.6.32-642.15.1.el6.x86_64 #1 SMP Fri Feb 24 14:31:22 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
agentzh commented 7 years ago

@weldpua2008 No, you should never use this tool in high-volume systems since this tool uses code instrumentation instead of sampling, unlike the on-CPU flame graph sampling tool.

weldpua2008 commented 7 years ago

@agentzh We are using the following scripts for a months:

#every 60 minutes
# Full code https://gist.github.com/weldpua2008/8b60d336cdd2fee233812dd44cbd50c6
# 
 /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-lua-stacks.sxx --arg time=5 --skip-badvars -x 6372 > /tmp/result.bt
/usr/src/openresty-systemtap-toolkit/fix-lua-bt /tmp/result.bt > /tmp/result-fix.bt
/usr/src/FlameGraph/stackcollapse-stap.pl /tmp/result-fix.bt > /tmp/result.cbt
/usr/src/FlameGraph/flamegraph.pl --encoding="ISO-8859-1" --title="Lua-land on-CPU for (`hostname`) at `date`" /tmp/result.cbt > /tlvmedia/code/www/result.svg

########
# every 10 minutes 
sudo stdbuf -oL /usr/src/stapxx/stap++ /usr/src/stapxx/samples/lj-vm-states.sxx -x `ps --no-headers -fC nginx|awk '/worker/  {print$2}'| shuf | head -n 1` --arg time=10 &> /tmp/lua-vm-state

but after adding the above script to schedule every 30 minutes (full version of our Memory Leak Flame Graph is at https://gist.github.com/weldpua2008/44e6884ac2bc6d0c129ddf03a9336656) we are experiencing reboot

weldpua2008 commented 7 years ago

@agentzh,

# openresty -V
nginx version: openresty/1.11.2.3
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC)
built with OpenSSL 1.0.2k  26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/openresty/nginx --with-cc-opt='-O2 -I/usr/local/openresty/zlib/include -I/usr/local/openresty/pcre/include -I/usr/local/openresty/openssl/include' --add-module=../ngx_devel_kit-0.3.0 --add-module=../echo-nginx-module-0.60 --add-module=../xss-nginx-module-0.05 --add-module=../ngx_coolkit-0.2rc3 --add-module=../set-misc-nginx-module-0.31 --add-module=../form-input-nginx-module-0.12 --add-module=../encrypted-session-nginx-module-0.06 --add-module=../srcache-nginx-module-0.31 --add-module=../ngx_lua-0.10.8 --add-module=../ngx_lua_upstream-0.06 --add-module=../headers-more-nginx-module-0.32 --add-module=../array-var-nginx-module-0.05 --add-module=../memc-nginx-module-0.18 --add-module=../redis2-nginx-module-0.14 --add-module=../redis-nginx-module-0.3.7 --with-ld-opt='-Wl,-rpath,/usr/local/openresty/luajit/lib -L/usr/local/openresty/zlib/lib -L/usr/local/openresty/pcre/lib -L/usr/local/openresty/openssl/lib -Wl,-rpath,/usr/local/openresty/zlib/lib:/usr/local/openresty/pcre/lib:/usr/local/openresty/openssl/lib' --with-pcre-jit --with-ipv6 --with-stream --with-stream_ssl_module --with-http_v2_module --without-mail_pop3_module --without-mail_imap_module --without-mail_smtp_module --with-http_stub_status_module --with-http_realip_module --with-http_addition_module --with-http_auth_request_module --with-http_secure_link_module --with-http_random_index_module --with-http_geoip_module --with-http_gzip_static_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-threads --with-file-aio --with-dtrace-probes --with-http_ssl_module

I have added crash and kernel debug to read vmcore. The vmcore-dmesg.txt content:

<7>stap_5d92dc3bc2adb3726c22e6eda3972f60_31180: systemtap: 2.9/0.164, base: ffffffffa0413000, memory: 3135data/69text/1110ctx/2058net/121062alloc kb, probes: 4
<1>BUG: unable to handle kernel NULL pointer dereference at (null)
<1>IP: [<ffffffff812a2f6b>] strcmp+0xb/0x30
<4>PGD 1b91445067 PUD 1b098f2067 PMD 0
<4>Oops: 0000 [#1] SMP
<4>last sysfs file: /sys/module/xt_state/sections/__mcount_loc
<4>CPU 2
<4>Modules linked in: stap_5d92dc3bc2adb3726c22e6eda3972f60_31180(U) uprobes(U) dccp_diag dccp tcp_diag inet_diag bonding ipv6 ipt_LOG xt_recent xt_state xt_limit xt_comment iptable_filter iptable_raw iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_mangle ip_tables ext2 microcode ipmi_devintf iTCO_wdt iTCO_vendor_support sg power_meter acpi_ipmi ipmi_si ipmi_msghandler ixgbe ptp pps_core mdio sb_edac edac_core i2c_i801 i2c_core lpc_ich mfd_core joydev ioatdma dca shpchp ext4 jbd2 mbcache sd_mod crc_t10dif megaraid_sas xhci_hcd ahci wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_694eefb08615576ffe8f8e195c3253fa_29461]
<4>
<4>Pid: 11717, comm: nginx Not tainted 2.6.32-696.1.1.el6.x86_64 #1 Supermicro PIO-618U-T4T+-ST031/X10DRU-i+
<4>RIP: 0010:[<ffffffff812a2f6b>]  [<ffffffff812a2f6b>] strcmp+0xb/0x30
<4>RSP: 0000:ffff8810c0583bc8  EFLAGS: 00010287
<4>RAX: 000000000000002f RBX: ffff880b3c643fe8 RCX: 0000000000002dc5
<4>RDX: 00000000000000d8 RSI: 0000000000000000 RDI: ffff880b3c643fe8
<4>RBP: ffff8810c0583bc8 R08: 0000000000000000 R09: 0000000000000000
<4>R10: 0000000000000000 R11: 0000000000000246 R12: ffffffffa05df9e0
<4>R13: 00007ffc425f5000 R14: ffff88115f0b2ab0 R15: 0000000000001000
<4>FS:  00007f6b89d0d720(0000) GS:ffff880061c80000(0000) knlGS:0000000000000000
<4>CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>CR2: 0000000000000000 CR3: 0000001a3140f000 CR4: 00000000001407e0
<4>DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
<4>DR3: 0000000000000000 DR6: 00000000ffff4ff0 DR7: 0000000000000400
<4>Process nginx (pid: 11717, threadinfo ffff8810c0580000, task ffff88115f0b2ab0)
<4>Stack:
<4> ffff8810c0583c18 ffffffffa0422109 00000000000000d8 ffffffff811bc950
<4><d> ffff8810c0583c58 ffffffffa05deba0 ffffffffa05deb80 ffff88115f0b2ab0
<4><d> ffffffffa05deb90 0000000000001000 ffff8810c0583c88 ffffffffa04196ae
<4>Call Trace:
<4> [<ffffffffa0422109>] _stp_vma_mmap_cb+0xd9/0x290 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
<4> [<ffffffff811bc950>] ? mntput_no_expire+0x30/0x110
<4> [<ffffffffa04196ae>] __stp_call_mmap_callbacks+0x8e/0xf0 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
<4> [<ffffffffa0422afc>] __stp_utrace_task_finder_target_quiesce+0x36c/0x400 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
<4> [<ffffffff810e275a>] utrace_get_signal+0x3da/0x730
<4> [<ffffffff810abb5d>] ? hrtimer_try_to_cancel+0x3d/0xd0
<4> [<ffffffff81097ee6>] get_signal_to_deliver+0x316/0x460
<4> [<ffffffff8100a285>] do_signal+0x75/0x870
<4> [<ffffffff811e3c34>] ? ep_poll+0x314/0x350
<4> [<ffffffff8106c480>] ? default_wake_function+0x0/0x20
<4> [<ffffffff8100ab10>] do_notify_resume+0x90/0xc0
<4> [<ffffffff8100b3a1>] int_signal+0x12/0x17
<4>Code: 84 ff 40 88 39 74 0d 48 83 c1 01 48 83 ea 01 75 e7 c6 01 00 c9 c3 66 2e 0f 1f 84 00 00 00 00 00 55 48 89 e5 0f 1f 40 00 0f b6 07 <0f> b6 16 48 83 c7 01 48 83 c6 01 38 d0 75 0e 84 c0 75 ea 31 c0
<1>RIP  [<ffffffff812a2f6b>] strcmp+0xb/0x30
<4> RSP <ffff8810c0583bc8>
<4>CR2: 0000000000000000

crash /usr/lib/debug/lib/modules/2.6.32-696.1.1.el6.x86_64/vmlinux ./vmcore

crash> bt
PID: 11717  TASK: ffff88115f0b2ab0  CPU: 2   COMMAND: "nginx"
 #0 [ffff8810c0583790] machine_kexec at ffffffff8103fd6b
 #1 [ffff8810c05837f0] crash_kexec at ffffffff810d1e12
 #2 [ffff8810c05838c0] oops_end at ffffffff8154ee30
 #3 [ffff8810c05838f0] no_context at ffffffff8105186b
 #4 [ffff8810c0583940] __bad_area_nosemaphore at ffffffff81051af5
 #5 [ffff8810c0583990] bad_area at ffffffff81051c1e
 #6 [ffff8810c05839c0] __do_page_fault at ffffffff81052423
 #7 [ffff8810c0583ae0] do_page_fault at ffffffff81550dbe
 #8 [ffff8810c0583b10] page_fault at ffffffff8154e0b5
    [exception RIP: strcmp+11]
    RIP: ffffffff812a2f6b  RSP: ffff8810c0583bc8  RFLAGS: 00010287
    RAX: 000000000000002f  RBX: ffff880b3c643fe8  RCX: 0000000000002dc5
    RDX: 00000000000000d8  RSI: 0000000000000000  RDI: ffff880b3c643fe8
    RBP: ffff8810c0583bc8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: ffffffffa05df9e0
    R13: 00007ffc425f5000  R14: ffff88115f0b2ab0  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff8810c0583bd0] _stp_vma_mmap_cb at ffffffffa0422109 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
#10 [ffff8810c0583c20] __stp_call_mmap_callbacks at ffffffffa04196ae [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
#11 [ffff8810c0583c90] __stp_utrace_task_finder_target_quiesce at ffffffffa0422afc [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
#12 [ffff8810c0583d00] utrace_get_signal at ffffffff810e275a
#13 [ffff8810c0583d90] get_signal_to_deliver at ffffffff81097ee6
#14 [ffff8810c0583e30] do_signal at ffffffff8100a285
#15 [ffff8810c0583f30] do_notify_resume at ffffffff8100ab10
#16 [ffff8810c0583f50] int_signal at ffffffff8100b3a1
    RIP: 0000003188ce91a3  RSP: 00007ffc4255ff88  RFLAGS: 00000246
    RAX: fffffffffffffffc  RBX: 0000000000000007  RCX: ffffffffffffffff
    RDX: 0000000000000200  RSI: 0000000001d83280  RDI: 0000000000000042
    RBP: 0000000000000001   R8: 000000000076f4e0   R9: 00007f6aaa6a0f78
    R10: 0000000000000007  R11: 0000000000000246  R12: 0000000001d734b0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
crash>

crash> bt -f
PID: 11717  TASK: ffff88115f0b2ab0  CPU: 2   COMMAND: "nginx"
 #0 [ffff8810c0583790] machine_kexec at ffffffff8103fd6b
    ffff8810c0583798: 0000000003091000 ffff880003091000
    ffff8810c05837a8: 0000000003090000 ffff8810c0583b18
    ffff8810c05837b8: 8800000000000000 000000000000ffff
    ffff8810c05837c8: ffff8810c0583b18 ffff8810c05837f8
    ffff8810c05837d8: 0000000000000009 ffff88115f0b2ab0
    ffff8810c05837e8: ffff8810c05838b8 ffffffff810d1e12
 #1 [ffff8810c05837f0] crash_kexec at ffffffff810d1e12
    ffff8810c05837f8: 0000000000001000 ffff88115f0b2ab0
    ffff8810c0583808: 00007ffc425f5000 ffffffffa05df9e0
    ffff8810c0583818: ffff8810c0583bc8 ffff880b3c643fe8
    ffff8810c0583828: 0000000000000246 0000000000000000
    ffff8810c0583838: 0000000000000000 0000000000000000
    ffff8810c0583848: 000000000000002f 0000000000002dc5
    ffff8810c0583858: 00000000000000d8 0000000000000000
    ffff8810c0583868: ffff880b3c643fe8 ffffffffffffffff
    ffff8810c0583878: ffffffff812a2f6b 0000000000000010
    ffff8810c0583888: 0000000000010287 ffff8810c0583bc8
    ffff8810c0583898: 0000000000000000 ffff8810c05838f8
    ffff8810c05838a8: 0000000000000246 ffff8810c0583b18
    ffff8810c05838b8: ffff8810c05838e8 ffffffff8154ee30
 #2 [ffff8810c05838c0] oops_end at ffffffff8154ee30
    ffff8810c05838c8: 0000000000000000 ffff8810c0583b18
    ffff8810c05838d8: 0000000000000000 0000000000000009
    ffff8810c05838e8: ffff8810c0583938 ffffffff8105186b
 #3 [ffff8810c05838f0] no_context at ffffffff8105186b
    ffff8810c05838f8: ffff88106332f020 00000014651ec6d0
    ffff8810c0583908: ffff881092304e1e 0000000000000000
    ffff8810c0583918: 0000000000000000 ffff8810c0583b18
    ffff8810c0583928: ffff88115f0b2ab0 0000000000030001
    ffff8810c0583938: ffff8810c0583988 ffffffff81051af5
 #4 [ffff8810c0583940] __bad_area_nosemaphore at ffffffff81051af5
    ffff8810c0583948: ffff8810c0583968 ffffffffa01f6593
    ffff8810c0583958: ffff8810045843a8 ffff8810c0583b18
    ffff8810c0583968: 0000000000000000 0000000000000000
    ffff8810c0583978: ffff882066af1250 ffff88115f0b2ab0
    ffff8810c0583988: ffff8810c05839b8 ffffffff81051c1e
 #5 [ffff8810c0583990] bad_area at ffffffff81051c1e
    ffff8810c0583998: ffffffff81477808 0000000000000028
    ffff8810c05839a8: 0000000000000000 ffff881cfa6a2b80
    ffff8810c05839b8: ffff8810c0583ad8 ffffffff81052423
 #6 [ffff8810c05839c0] __do_page_fault at ffffffff81052423
    ffff8810c05839c8: ffff8810c0583a18 ffffffff8149de08
    ffff8810c05839d8: ffff8810c0583b18 0000000000000000
    ffff8810c05839e8: ffff881cfa6a2be8 0000000000000000
    ffff8810c05839f8: ffff88205d447600 ffff881067060020
    ffff8810c0583a08: ffff881061123180 0000000000000246
    ffff8810c0583a18: ffff8810c0583a58 ffffffff81480924
    ffff8810c0583a28: ffffffffa02ccd78 ffff88106586d240
    ffff8810c0583a38: ffff8810045843a8 ffff88106332f6e0
    ffff8810c0583a48: 0000000000000002 ffff8810045843a8
    ffff8810c0583a58: ffff8810c0583a78 ffffffffa0398821
    ffff8810c0583a68: ffff8810045843a8 ffff88106586d240
    ffff8810c0583a78: ffff8810c0583aa8 ffffffffa039a343
    ffff8810c0583a88: ffff8810045843a8 ffff88106332f6e0
    ffff8810c0583a98: ffff88106332f020 ffff88106332f6e8
    ffff8810c0583aa8: ffff8810c0583ae8 ffff8810c0583b18
    ffff8810c0583ab8: 0000000000000000 0000000000000000
    ffff8810c0583ac8: ffff88115f0b2ab0 0000000000001000
    ffff8810c0583ad8: ffff8810c0583b08 ffffffff81550dbe
 #7 [ffff8810c0583ae0] do_page_fault at ffffffff81550dbe
    ffff8810c0583ae8: 0000000000000001 ffffffffa05df9e0
    ffff8810c0583af8: 00007ffc425f5000 ffff88115f0b2ab0
    ffff8810c0583b08: ffff8810c0583bc8 ffffffff8154e0b5
 #8 [ffff8810c0583b10] page_fault at ffffffff8154e0b5
    [exception RIP: strcmp+11]
    RIP: ffffffff812a2f6b  RSP: ffff8810c0583bc8  RFLAGS: 00010287
    RAX: 000000000000002f  RBX: ffff880b3c643fe8  RCX: 0000000000002dc5
    RDX: 00000000000000d8  RSI: 0000000000000000  RDI: ffff880b3c643fe8
    RBP: ffff8810c0583bc8   R8: 0000000000000000   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000246  R12: ffffffffa05df9e0
    R13: 00007ffc425f5000  R14: ffff88115f0b2ab0  R15: 0000000000001000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
    ffff8810c0583b18: 0000000000001000 ffff88115f0b2ab0
    ffff8810c0583b28: 00007ffc425f5000 ffffffffa05df9e0
    ffff8810c0583b38: ffff8810c0583bc8 ffff880b3c643fe8
    ffff8810c0583b48: 0000000000000246 0000000000000000
    ffff8810c0583b58: 0000000000000000 0000000000000000
    ffff8810c0583b68: 000000000000002f 0000000000002dc5
    ffff8810c0583b78: 00000000000000d8 0000000000000000
    ffff8810c0583b88: ffff880b3c643fe8 ffffffffffffffff
    ffff8810c0583b98: ffffffff812a2f6b 0000000000000010
    ffff8810c0583ba8: 0000000000010287 ffff8810c0583bc8
    ffff8810c0583bb8: 0000000000000000 0000000000001000
    ffff8810c0583bc8: ffff8810c0583c18 ffffffffa0422109
 #9 [ffff8810c0583bd0] _stp_vma_mmap_cb at ffffffffa0422109 [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
    ffff8810c0583bd8: 00000000000000d8 ffffffff811bc950
    ffff8810c0583be8: ffff8810c0583c58 ffffffffa05deba0
    ffff8810c0583bf8: ffffffffa05deb80 ffff88115f0b2ab0
    ffff8810c0583c08: ffffffffa05deb90 0000000000001000
    ffff8810c0583c18: ffff8810c0583c88 ffffffffa04196ae
#10 [ffff8810c0583c20] __stp_call_mmap_callbacks at ffffffffa04196ae [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
    ffff8810c0583c28: 0000000000000000 0000000008040074
    ffff8810c0583c38: ffff8810c0583c68 00007ffc425f5000
    ffff8810c0583c48: ffff8808fc4d7900 ffff880b3c643fe8
    ffff8810c0583c58: ffffffffa05deb80 ffff88115f0b2ab0
    ffff8810c0583c68: ffff880a54e2b3e0 ffffffffa05deb80
    ffff8810c0583c78: 000000000000006b 000000000000006a
    ffff8810c0583c88: ffff8810c0583cf8 ffffffffa0422afc
#11 [ffff8810c0583c90] __stp_utrace_task_finder_target_quiesce at ffffffffa0422afc [stap_5d92dc3bc2adb3726c22e6eda3972f60_31180]
    ffff8810c0583c98: 0000000000000000 0000000008040074
    ffff8810c0583ca8: ffff881cfa6a2b80 ffff880b3c643fe8
    ffff8810c0583cb8: ffff880a54e2a000 ffff880b3c643000
    ffff8810c0583cc8: 0000000000000002 ffff88131ab524e0
    ffff8810c0583cd8: ffff8820103ed450 ffff88115f0b2ab0
    ffff8810c0583ce8: ffff8810c0583ed8 ffff88131ab524e8
    ffff8810c0583cf8: ffff8810c0583d88 ffffffff810e275a
#12 [ffff8810c0583d00] utrace_get_signal at ffffffff810e275a
    ffff8810c0583d08: 00000060c0583d48 0000000000000000
    ffff8810c0583d18: 0000000000000000 ffff8810c0583f58
    ffff8810c0583d28: ffff8810c0583e58 0000000000000001
    ffff8810c0583d38: 0000000500000060 0000010000000005
    ffff8810c0583d48: ffff8810c0583d88 ffffffff810abb5d
    ffff8810c0583d58: ffff8810c0583de8 ffff8810c0583f58
    ffff8810c0583d68: ffff88115f0b2ab0 ffff8810c0583e58
    ffff8810c0583d78: ffff88114058f5c0 ffff88115f0b2ab0
    ffff8810c0583d88: ffff8810c0583e28 ffffffff81097ee6
#13 [ffff8810c0583d90] get_signal_to_deliver at ffffffff81097ee6
    ffff8810c0583d98: 000000000000c350 ffff8810c0583db8
    ffff8810c0583da8: ffff8810c0583e48 ffff88115f0b3128
    ffff8810c0583db8: ffff8810c0583ed8 ffff8810c0583f58
    ffff8810c0583dc8: ffff88115f0b2ab0 ffff88115f0b2ab0
    ffff8810c0583dd8: ffff88115f0b2ab0 ffff882065225e48
    ffff8810c0583de8: ffff882065225640 ffff88115f0b3228
    ffff8810c0583df8: 0000000000002dc5 ffff8810c0583f58
    ffff8810c0583e08: ffff8810c0583ed8 ffff8810c0583e58
    ffff8810c0583e18: 0000000000000000 ffff88115f0b3228
    ffff8810c0583e28: ffff8810c0583f28 ffffffff8100a285
#14 [ffff8810c0583e30] do_signal at ffffffff8100a285
    ffff8810c0583e38: 0000000000000000 0000000000000286
    ffff8810c0583e48: ffff8810c0583f38 ffffffff811e3c34
    ffff8810c0583e58: 0000000000000286 ffff8811fffffffc
    ffff8810c0583e68: 00000200054b6300 0000000001d83280
    ffff8810c0583e78: 000000000004d17d 000000002267d4b8
    ffff8810c0583e88: ffff881000000001 ffff88115f0b2ab0
    ffff8810c0583e98: ffffffff8106c480 dead000000100100
    ffff8810c0583ea8: dead000000200200 ffff880c054b6300
    ffff8810c0583eb8: 0000000000000000 00000000006acfc0
    ffff8810c0583ec8: 000000000004d17d 000000002267d4b8
    ffff8810c0583ed8: 0000000000000000 0000000000000000
    ffff8810c0583ee8: 0000000000000000 0000000000000000
    ffff8810c0583ef8: 00011f31c3e676b8 0000000000000006
    ffff8810c0583f08: ffff8810c0583f58 0000000000000000
    ffff8810c0583f18: 0000000000000000 0000000000000000
    ffff8810c0583f28: ffff8810c0583f48 ffffffff8100ab10
#15 [ffff8810c0583f30] do_notify_resume at ffffffff8100ab10
    ffff8810c0583f38: 0000000000000007 0000000001d734b0
    ffff8810c0583f48: 0000000000000001 ffffffff8100b3a1
#16 [ffff8810c0583f50] int_signal at ffffffff8100b3a1
    RIP: 0000003188ce91a3  RSP: 00007ffc4255ff88  RFLAGS: 00000246
    RAX: fffffffffffffffc  RBX: 0000000000000007  RCX: ffffffffffffffff
    RDX: 0000000000000200  RSI: 0000000001d83280  RDI: 0000000000000042
    RBP: 0000000000000001   R8: 000000000076f4e0   R9: 00007f6aaa6a0f78
    R10: 0000000000000007  R11: 0000000000000246  R12: 0000000001d734b0
    R13: 0000000000000000  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 00000000000000e8  CS: 0033  SS: 002b
crash>

crash> ps is at https://gist.github.com/weldpua2008/5d19c26b80bfbdd0566561a9cbd3cde6 crash> vm https://gist.github.com/weldpua2008/e50b3f677016cd6bb523cfbd6389fdd8 cr5ash> files https://gist.github.com/weldpua2008/e9068f890bfbb05ac083f6365a99beb0

hamishforbes commented 7 years ago

Hi, Did you ever work out a solution for this?

I'm having similar issues, not running on a cron but just trying to pin down an intermittent hotloop problem.

I've found that running ngx-active-reqs script, from openresty-systemtap-toolkit, and then lj-lua-bt causes a crash almost every time. This is on an up to date Centos 6.8 system

agentzh commented 7 years ago

@hamishforbes Ensure you build the latest systemtap from its source release. Do not use the version in the system yum repository! It's ancient and very buggy.