openresty / openresty-systemtap-toolkit

Real-time analysis and diagnostics tools for OpenResty (including NGINX, LuaJIT, ngx_lua, and more) based on SystemTap
1.66k stars 356 forks source link

Pools leaked at backtrace - question #36

Open pavelnemirovsky opened 7 years ago

pavelnemirovsky commented 7 years ago

Hello,

Can you please guide me what's the reason i was unable identify the leaked pool using "ngx-backtrace", its appears as below? We are running latest version of openresty not debug one!

Thanks in advance,

P

[root@server nginx-systemtap-toolkit]# ./ngx-leaked-pools -p 8944
Tracing 8944 (/usr/local/openresty/nginx/sbin/nginx)...
Hit Ctrl-C to end.
^C
277 pools leaked at backtrace 0x41e5da 0x43b977 0x43af02 0x43ac59 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
1 pools leaked at backtrace 0x41e5da 0x4f5965 0x7fca24427db3 0x4deb10 0x4f6189 0x43aeb3 0x43ac71 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
1 pools leaked at backtrace 0x41e5da 0x45792b 0x45a70d 0x443945 0x43ac45 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789

Run the command "./ngx-backtrace -p 8944 <backtrace>" to get details.
For total 1493 pools allocated.
[root@s19 nginx-systemtap-toolkit]# /ngx-backtrace -p 8944 0x41e5da 0x43b977 0x43af02 0x43ac59 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
-bash: /ngx-backtrace: No such file or directory
[root@server nginx-systemtap-toolkit]# ./ngx-backtrace -p 8944 0x41e5da 0x43b977 0x43af02 0x43ac59 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
??
??:0
agentzh commented 7 years ago

@pavelnemirovsky Seems like your nginx binary lacks debug symbols (aka debuginfo)?

agentzh commented 7 years ago

@pavelnemirovsky BTW, official OpenResty or NGINX distributions always build with debug symbols out of the box (note: it has nothing to do with debugging builds with ./configure --with-debug).

pavelnemirovsky commented 7 years ago

Thanks a lot agentzh, that's my concern since it is original openresty packages installed from your repo:

[root@server ~]# rpm -q -a | grep openresty
openresty-debuginfo-1.11.2.1-3.el6.x86_64
openresty-pcre-debuginfo-8.39-3.el6.x86_64
openresty-zlib-1.2.8-1.el6.x86_64
openresty-openssl-1.0.2j-1.el6.x86_64
openresty-openssl-debug-1.0.2j-1.el6.x86_64
openresty-1.11.2.1-3.el6.x86_64
openresty-openssl-debuginfo-1.0.2j-1.el6.x86_64
openresty-pcre-8.39-3.el6.x86_64
openresty-debug-1.11.2.1-3.el6.x86_64
openresty-zlib-debuginfo-1.2.8-1.el6.x86_64

[root@server nginx-systemtap-toolkit]# ps ax | grep nginx
10701 pts/1    S+     0:00 grep nginx
31513 ?        Ss     0:03 nginx: master process /usr/local/openresty/nginx/sbin/nginx -c /etc/nginx/nginx.conf
31517 ?        S<    95:45 nginx: worker process
31525 ?        R<    95:34 nginx: worker process
31526 ?        R<    94:46 nginx: worker process
31531 ?        S<    95:39 nginx: worker process
31532 ?        R<    94:10 nginx: worker process
31539 ?        S<    95:51 nginx: worker process
31551 ?        R<    94:59 nginx: worker process
31553 ?        S<    95:32 nginx: worker process
31558 ?        S<    94:39 nginx: worker process

[root@server nginx-systemtap-toolkit]# ./check-debug-info -p 31513
File /lib64/ld-2.12.so has no debug info embedded.
File /lib64/libc-2.12.so has no debug info embedded.
File /lib64/libcrypt-2.12.so has no debug info embedded.
File /lib64/libdl-2.12.so has no debug info embedded.
File /lib64/libfreebl3.so has no debug info embedded.
File /lib64/libm-2.12.so has no debug info embedded.
File /lib64/libnss_dns-2.12.so has no debug info embedded.
File /lib64/libnss_files-2.12.so has no debug info embedded.
File /lib64/libpthread-2.12.so has no debug info embedded.
File /lib64/libresolv-2.12.so has no debug info embedded.
File /usr/lib64/lua/5.1/lfs.so has no debug info embedded.
File /usr/local/lib/lua/5.1/libmaxminddb.so has no debug info embedded.
File /usr/local/lib/lua/5.1/usersshm.so has no debug info embedded.
File /usr/local/openresty/lualib/cjson.so has no debug info embedded.
File /usr/local/openresty/nginx/sbin/nginx has no debug info embedded.
pavelnemirovsky commented 7 years ago

Looks like here is the problem:

[root@server nginx-systemtap-toolkit]# ./ngx-backtrace -p31517 0x41e5da
Nginx path /usr/local/openresty/nginx/sbin/nginx??
??:0
agentzh commented 7 years ago

@pavelnemirovsky Okay, seems like you have already installed the openresty-debuginfo package.

I guess your addr2line utility invoked by the ngx-backtrace tool does not yet support separated debuginfo data from the binary executable.

Please try the following patch for ngx-leaked-pools to let systemtap print out the backtrace symbols directly:

diff --git a/ngx-leaked-pools b/ngx-leaked-pools
index ec39086..5fc02e5 100755
--- a/ngx-leaked-pools
+++ b/ngx-leaked-pools
@@ -108,7 +108,7 @@ probe end {

     hits = 0
     foreach (bt in btcount- limit 10) {
-        printf("%d pools leaked at backtrace %s\\n", btcount[bt], bt)
+        printf("%d pools leaked at backtrace:\\n%s\\n", btcount[bt], print_ustack(bt))
         hits++
     }

Please let me know if this patch works for you. Thanks!

pavelnemirovsky commented 7 years ago

Done

[root@server nginx-systemtap-toolkit]# ./ngx-leaked-pools -p 31517
Tracing 31517 (/usr/local/openresty/nginx/sbin/nginx)...
Hit Ctrl-C to end.
^C
 0x41e5da : 0x41e5da
 0x43b977 : 0x43b977
 0x43af02 : 0x43af02
 0x43ac59 : 0x43ac59
 0x4417b8 : 0x4417b8
 0x43fdf7 : 0x43fdf7
 0x440cbc : 0x440cbc
 0x441e44 : 0x441e44
 0x41d5bf : 0x41d5bf
 0x3953c1ed1d : 0x3953c1ed1d
 0x41b789 : 0x41b789
349 pools leaked at backtrace:

 0x41e5da : 0x41e5da
 0x45792b : 0x45792b
 0x45a8f7 : 0x45a8f7
 0x443945 : 0x443945
 0x43ac45 : 0x43ac45
 0x4417b8 : 0x4417b8
 0x43fdf7 : 0x43fdf7
 0x440cbc : 0x440cbc
 0x441e44 : 0x441e44
 0x41d5bf : 0x41d5bf
 0x3953c1ed1d : 0x3953c1ed1d
 0x41b789 : 0x41b789
2 pools leaked at backtrace:

 0x41e5da : 0x41e5da
 0x4f5965 : 0x4f5965
 0x7f722888ddb3 : 0x7f722888ddb3
 0x4deb10 : 0x4deb10
 0x4f6189 : 0x4f6189
 0x43aeb3 : 0x43aeb3
 0x43ac71 : 0x43ac71
 0x4417b8 : 0x4417b8
 0x43fdf7 : 0x43fdf7
 0x440cbc : 0x440cbc
 0x441e44 : 0x441e44
 0x41d5bf : 0x41d5bf
 0x3953c1ed1d : 0x3953c1ed1d
 0x41b789 : 0x41b789
1 pools leaked at backtrace:

Run the command "./ngx-backtrace -p 31517 <backtrace>" to get details.
For total 1937 pools allocated.
pavelnemirovsky commented 7 years ago

[root@server nginx-systemtap-toolkit]# rpm -q -a | grep systemtap systemtap-2.9-4.el6.x86_64 systemtap-devel-2.9-4.el6.x86_64 systemtap-runtime-2.9-4.el6.x86_64 systemtap-client-2.9-4.el6.x86_64

agentzh commented 7 years ago

@pavelnemirovsky Strange. Maybe you should try building the latest version of systemtap and elfutils from source?

pavelnemirovsky commented 7 years ago

maybe this is the problem ?

root@s19 nginx-systemtap-toolkit]# addr2line -v
GNU addr2line version 2.20.51.0.2-5.44.el6 20100205
Copyright 2009 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of
the GNU General Public License version 3 or (at your option) any later version.
This program has absolutely no warranty.
agentzh commented 7 years ago

@pavelnemirovsky This doc might be helpful. But you also need to update the version numbers there: https://openresty.org/en/build-systemtap.html

BTW, also please ensure that you have installed those kernel-* packages matching your current running kernel: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/SystemTap_Beginners_Guide/using-systemtap.html

pavelnemirovsky commented 7 years ago

I actually followed the guide in the first place:

[root@s19 nginx-systemtap-toolkit]# rpm -q -a | grep kernel
kernel-devel-2.6.32-642.6.1.el6.x86_64
kernel-headers-2.6.32-642.6.1.el6.x86_64
libreport-plugin-kerneloops-2.0.9-32.el6.centos.x86_64
kernel-2.6.32-573.el6.x86_64
kernel-firmware-2.6.32-642.6.1.el6.noarch
kernel-devel-2.6.32-642.6.2.el6.x86_64
kernel-debuginfo-common-x86_64-2.6.32-642.6.1.el6.x86_64
dracut-kernel-004-409.el6_8.2.noarch
kernel-2.6.32-642.6.1.el6.x86_64
kernel-debug-debuginfo-2.6.32-642.6.1.el6.x86_64
abrt-addon-kerneloops-2.0.8-40.el6.centos.x86_64
agentzh commented 7 years ago

@pavelnemirovsky BTW, you can also use gdb to translate those backtrace frame addresses like below:

(gdb) info symbol 0x41e5da

Your current gdb session needs to get attached to one of your nginx processes first though.

pavelnemirovsky commented 7 years ago

Thanks a lot, that's what i get

(gdb) info symbol 0x41e5da ngx_create_pool + 122 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x43b977 ngx_event_accept + 583 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x43af02 ngx_event_process_posted + 50 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x43ac59 ngx_process_events_and_timers + 121 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x4417b8 ngx_worker_process_cycle + 56 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x43fdf7 ngx_spawn_process + 375 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x440cbc ngx_start_worker_processes + 108 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x441e44 ngx_master_process_cycle + 452 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x41d5bf main + 2719 in section .text of /usr/local/openresty/nginx/sbin/nginx (gdb) info symbol 0x3953c1ed1d __libc_start_main + 253 in section .text of /lib64/libc.so.6 (gdb) info symbol 0x41b789 _start + 41 in section .text of /usr/local/openresty/nginx/sbin/nginx

agentzh commented 7 years ago

@pavelnemirovsky Good, gdb works, to some extend. But the backtraces are still incomplete. The debuginfo does not really take effect here. Strange.

BTW, your backtrace above is a false positive due to the sampling window boundaries of your last run of the ngx-leaked-pools tool.

agentzh commented 7 years ago

I think you should give more more details about your operating system. It's a CentOS? CentOS 6.x? What version?

pavelnemirovsky commented 7 years ago

Great, here is it: CentOS release 6.8 (Final)

pavelnemirovsky commented 7 years ago

Me concern this

[root@server nginx-systemtap-toolkit]# stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}'
Pass 1: parsed user script and 111 library script(s) using 206796virt/34536res/3216shr/31672data kb, in 280usr/20sys/294real ms.
semantic error: while resolving probe point: identifier 'kernel' at /usr/share/systemtap/tapset/linux/vfs.stp:882:18
        source: probe vfs.read = kernel.function("vfs_read")
                                 ^

semantic error: missing x86_64 kernel/module debuginfo [man warning::debuginfo] under '/lib/modules/2.6.32-642.6.1.el6.x86_64/build'

semantic error: while resolving probe point: identifier 'vfs' at <input>:1:7
        source: probe vfs.read {printf("read performed\n"); exit()}
                      ^

semantic error: no match

Pass 2: analyzed script: 0 probe(s), 0 function(s), 0 embed(s), 0 global(s) using 214480virt/42084res/5160shr/37252data kb, in 100usr/150sys/254real ms.
Pass 2: analysis failed.  [man error::pass2]

Just compiled stap with elf

[root@server systemtap-3.0]# ./stap -v -e 'probe vfs.read {printf("read performed\n"); exit()}' Pass 1: parsed user script and 0 library scripts using 131048virt/6680res/2048shr/4936data kb, in 0usr/0sys/0real ms. semantic error: while resolving probe point: identifier 'vfs' at :1:7 source: probe vfs.read {printf("read performed\n"); exit()} ^

semantic error: probe point mismatch (similar: procfs, end, never, perf, begin): identifier 'vfs' at :1:7 source: probe vfs.read {printf("read performed\n"); exit()} ^

Pass 2: analyzed script: 0 probes, 0 functions, 0 embeds, 0 globals using 131048virt/7116res/2424shr/4936data kb, in 0usr/10sys/3real ms. Pass 2: analysis failed. [man error::pass2]

agentzh commented 7 years ago

@pavelnemirovsky Maybe your running kernel does not match the kernel-debuginfo* packages you have installed. Check with the uname -r command.

pavelnemirovsky commented 7 years ago
[root@server systemtap-3.0]# uname -r
2.6.32-642.6.1.el6.x86_64
agentzh commented 7 years ago

@pavelnemirovsky Indeed. You have no kernel-debuginfo-2.6.32-642.6.1.el6.x86_64 installed in your system. You should check it out yourself. I don't really have the time to do everything for you.

pavelnemirovsky commented 7 years ago

This package was missed "yum install kernel-debuginfo-2.6.32-642.6.1.el6.x86_64" but i still see same result.

agentzh commented 7 years ago

@pavelnemirovsky Then you should really ask on the systemtap mailing list instead. It's already OT here.

pavelnemirovsky commented 7 years ago

We were in sync, it was missed from CLI due "kernel-debug-debuginfo"

pavelnemirovsky commented 7 years ago

i will but systemtap looks to me fine right now, but this one still doesn't work.

[root@server nginx-systemtap-toolkit]# ./ngx-backtrace -p 37848 0x41e5da ?? ??:0

agentzh commented 7 years ago

@pavelnemirovsky If your addr2line does not support separate debuginfo files, then there is nothing we could do to make it work in ngx-backtrace.

If it's that the debuginfo package of openresty-debuginfo installs debuginfo files into wrong locations, then we need your help to determine it (by checking the file paths inside other *-debuginfo packages, for example).

pavelnemirovsky commented 7 years ago

Got u checking ...

pavelnemirovsky commented 7 years ago

Looks like here is the issue, looks like addr2line is looking in different place vs rpm installs it ...

[root@server nginx-systemtap-toolkit]# strace addr2line -e /proc/37848/exe -f 0x41e5da
execve("/usr/bin/addr2line", ["addr2line", "-e", "/proc/37848/exe", "-f", "0x41e5da"], [/* 29 vars */]) = 0
brk(0)                                  = 0x133f000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0f5b77000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=41129, ...}) = 0
mmap(NULL, 41129, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7fa0f5b6c000
close(3)                                = 0
open("/usr/lib64/libbfd-2.20.51.0.2-5.44.el6.so", O_RDONLY) = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@\316\202S9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=949728, ...}) = 0
mmap(0x3953800000, 3059480, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3953800000
mprotect(0x39538d6000, 2097152, PROT_NONE) = 0
mmap(0x3953ad6000, 69632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0xd6000) = 0x3953ad6000
mmap(0x3953ae7000, 16152, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3953ae7000
close(3)                                = 0
open("/lib64/libz.so.1", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 !\300T9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=91096, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0f5b6b000
mmap(0x3954c00000, 2183696, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3954c00000
mprotect(0x3954c15000, 2093056, PROT_NONE) = 0
mmap(0x3954e14000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x14000) = 0x3954e14000
close(3)                                = 0
open("/lib64/libc.so.6", O_RDONLY)      = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0000\356\301S9\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1928936, ...}) = 0
mmap(0x3953c00000, 3750184, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x3953c00000
mprotect(0x3953d8a000, 2097152, PROT_NONE) = 0
mmap(0x3953f8a000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x18a000) = 0x3953f8a000
mmap(0x3953f90000, 14632, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x3953f90000
close(3)                                = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0f5b6a000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0f5b69000
arch_prctl(ARCH_SET_FS, 0x7fa0f5b6a700) = 0
mprotect(0x3954e14000, 4096, PROT_READ) = 0
mprotect(0x3953f8a000, 16384, PROT_READ) = 0
mprotect(0x395361f000, 4096, PROT_READ) = 0
munmap(0x7fa0f5b6c000, 41129)           = 0
brk(0)                                  = 0x133f000
brk(0x1360000)                          = 0x1360000
stat("/proc/37848/exe", {st_mode=S_IFREG|0755, st_size=1490728, ...}) = 0
open("/proc/37848/exe", O_RDONLY)       = 3
fcntl(3, F_GETFD)                       = 0
fcntl(3, F_SETFD, FD_CLOEXEC)           = 0
fstat(3, {st_mode=S_IFREG|0755, st_size=1490728, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0f5b76000
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0`\267A\0\0\0\0\0"..., 4096) = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 4096, SEEK_SET)                = 4096
lseek(3, 1486848, SEEK_SET)             = 1486848
read(3, "stapsdt\0_\7N\0\0\0\0\0000\303R\0\0\0\0\0\200\245v\0\0\0\0\0"..., 4096) = 3880
lseek(3, 1490728, SEEK_SET)             = 1490728
lseek(3, 1490728, SEEK_SET)             = 1490728
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0`\267A\0\0\0\0\0"..., 4096) = 4096
lseek(3, 1486848, SEEK_SET)             = 1486848
read(3, "stapsdt\0_\7N\0\0\0\0\0000\303R\0\0\0\0\0\200\245v\0\0\0\0\0"..., 4096) = 3880
lseek(3, 0, SEEK_SET)                   = 0
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\2\0>\0\1\0\0\0`\267A\0\0\0\0\0"..., 4096) = 4096
lseek(3, 1482752, SEEK_SET)             = 1482752
read(3, "\200\235u\0\0\0\0\0`\237u\0\0\0\0\0@\241u\0\0\0\0\0 \243u\0\0\0\0\0"..., 4096) = 4096
read(3, "stapsdt\0_\7N\0\0\0\0\0000\303R\0\0\0\0\0\200\245v\0\0\0\0\0"..., 4096) = 3880
lstat("/proc", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
lstat("/proc/37848", {st_mode=S_IFDIR|0555, st_size=0, ...}) = 0
lstat("/proc/37848/exe", {st_mode=S_IFLNK|0777, st_size=0, ...}) = 0
readlink("/proc/37848/exe", "/usr/local/openresty/nginx/sbin/"..., 4095) = 37
lstat("/usr", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local/openresty", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local/openresty/nginx", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local/openresty/nginx/sbin", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
lstat("/usr/local/openresty/nginx/sbin/nginx", {st_mode=S_IFREG|0755, st_size=1490728, ...}) = 0
open("/proc/37848/nginx.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/proc/37848/.debug/nginx.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib64/debug/usr/local/openresty/nginx/sbin/nginx.debug", O_RDONLY) = -1 ENOENT (No such file or directory)
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fa0f5b75000
write(1, "??\n", 3??
)                     = 3
write(1, "??:0\n", 5??:0
)                   = 5
close(3)                                = 0
munmap(0x7fa0f5b76000, 4096)            = 0
exit_group(0)                           = ?
+++ exited with 0 +++
pavelnemirovsky commented 7 years ago

Meanwhile i had to do this on my machine "ln -s /usr/lib/debug/ /usr/lib64/debug" do u want me to submit PR and change spec file to support lib64 path for openresty-debuginfo?

pavelnemirovsky commented 7 years ago

Now i see the proper backtrace, thank you so much, i really appreciate your help and definitely will support your project. Regarding an output we constantly see that nginx workers are mem leaking meaning process memory consumption constantly increasing, we use few packages that were installed without debug info, i'll prepare a proper output later to resolve missed missed backtrace symbols.

Tracing 37848 (/usr/local/openresty/nginx/sbin/nginx)...
Hit Ctrl-C to end.
^C
509 pools leaked at backtrace 0x41e5da 0x43b977 0x43af02 0x43ac59 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
3 pools leaked at backtrace 0x41e5da 0x45792b 0x45a70d 0x443945 0x43ac45 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
1 pools leaked at backtrace 0x41e5da 0x4f5965 0x7f03894d0db3 0x4deb10 0x4f6189 0x43aeb3 0x43ac71 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
1 pools leaked at backtrace 0x41e5da 0x45792b 0x45a8f7 0x443945 0x43ac45 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
1 pools leaked at backtrace 0x41e5da 0x45792b 0x45a8f7 0x43af02 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789

[root@server nginx-systemtap-toolkit]# ./ngx-backtrace -p 37848 0x41e5da 0x43b977 0x43af02 0x43ac59 0x4417b8 0x43fdf7 0x440cbc 0x441e44 0x41d5bf 0x3953c1ed1d 0x41b789
ngx_create_pool
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/core/ngx_palloc.c:43
ngx_event_accept
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/event/ngx_event_accept.c:162
ngx_event_process_posted
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/event/ngx_event_posted.c:23
ngx_process_events_and_timers
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/event/ngx_event.c:251
ngx_worker_process_cycle
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/os/unix/ngx_process_cycle.c:755
ngx_spawn_process
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/os/unix/ngx_process.c:205
ngx_start_worker_processes
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/os/unix/ngx_process_cycle.c:361
ngx_master_process_cycle
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/os/unix/ngx_process_cycle.c:132
main
/usr/src/debug/openresty-1.11.2.1/build/nginx-1.11.2/src/core/nginx.c:367
??
??:0
??
??:0
agentzh commented 7 years ago

@pavelnemirovsky There is no real leaked memory pools according to your output. You can increase your sampling time window to see if the counts in the output increase proportionally, otherwise it's just noises due to your time window boundaries.

agentzh commented 7 years ago

@pavelnemirovsky It's important to find out if your nginx worker's RES memory size can indeed grow without bound. Otherwise the freed memory may just get cached by your glibc library on the userland for future allocations or just cannot get returned back to the OS due to other reasons (like memory fragmentation). The next version of OpenResty (1.11.2.2 RC1 atm) introduces the lua_malloc_trim directive which often makes the memory footprint much smaller after traffic peaks than previous versions (including 1.11.2.1) by forcing glibc to return cached free memory back to the OS. If it's due to glibc caching or memory fragmentation, then such high memory usage is not really a leak.

For tracking real memory leaks, the following tools are more useful than ngx-leaked-pools since leaking memory pools are rare in the wild anyway. See

https://github.com/openresty/stapxx#sample-bt-leaks

https://github.com/openresty/stapxx#lj-gc

https://github.com/openresty/stapxx#lj-gc-objs

agentzh commented 7 years ago

@pavelnemirovsky The /usr/lib64/debug/ path is a good find. I wonder how we can fix that while still keep our spec file portable (we still need to support other RPM-based distributions like Fedora and 32-bit architectures). PRs welcome :) Thanks for your report!

pavelnemirovsky commented 7 years ago

Thanks again agenttz, your inputs are extremely valuable. Actually i tested lua_malloc_trim functionality as part of "lua-nginx-module 0.10.7", but i didn't notice any changes in term of memory consumption / release, maybe i miss something. We observe that NGINX workers upon a start are starting to grow and allocate more and more memory and when we reach 2.2Gb per worker, it looks that GC cycle is working more aggressively in term of CPU consumption and then we do manual restart for whole nginx service with 4 hours interval. You may see below how it looks like in terms of graph. Regarding the rest of the things you proposed i'll try to grab more info and will reply later. Thanks again.

image

pavelnemirovsky commented 7 years ago

Here is an output of 3 tools u proposed to use:

[root@server stapxx]# ./stap++ samples/lj-gc.sxx -x 16677
Found exact match for libluajit: /usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0
Start tracing 16677 (/usr/local/openresty/nginx/sbin/nginx)
Total GC count: 603957268 byte
root@server stapxx]# ./stap++ samples/lj-gc-objs.sxx -x 16677 -D MAXACTION=90000000
Found exact match for libluajit: /usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0
ERROR: Skipped too many probes, check MAXSKIPPED or try again with stap -t for more details.
Start tracing 16677 (/usr/local/openresty/nginx/sbin/nginx)

main machine code area size: 262144 bytes
C callback machine code size: 4096 bytes
GC total size: 958903296 bytes
GC state: sweep-string

5565968 table objects: max=12582944, avg=138, min=32, sum=768677744 (in bytes)
1113062 upvalue objects: max=24, avg=24, min=24, sum=26713488 (in bytes)
875271 string objects: max=37510, avg=124, min=18, sum=108583139 (in bytes)
563922 userdata objects: max=19680, avg=36, min=27, sum=20455103 (in bytes)
479492 function objects: max=144, avg=38, min=20, sum=18632028 (in bytes)
91010 cdata objects: max=176, avg=38, min=12, sum=3523042 (in bytes)
2020 thread objects: max=2544, avg=1170, min=424, sum=2365216 (in bytes)
753 trace objects: max=3064, avg=410, min=160, sum=309184 (in bytes)
346 proto objects: max=810300, avg=3506, min=80, sum=1213100 (in bytes)
JIT state size: 11204 bytes
global state tmpbuf size: 16384 bytes
C type state size: 8664 bytes

My GC walker detected for total 958902136 bytes.
4497262 microseconds elapsed in the probe handler.
ERROR: probe overhead exceeded threshold
WARNING: Number of errors: 1, skipped probes: 115
WARNING: /usr/bin/staprun exited with status: 1
Pass 5: run failed.  [man error::pass5]

sample-bt-leaks.sxx looks as below:

image a.svg.zip

agentzh commented 7 years ago

I'm afraid we are already well off topic here.

pavelnemirovsky commented 7 years ago

yes i agree, i apologize i'll open another one

huang19910425 commented 6 years ago

Can you help me to see the mistake?

[root@1_201 sh]# ps aux | grep nginx root 10025 0.0 0.0 510064 1708 ? Ss 18:31 0:00 nginx: master process /data/app/openresty/nginx/sbin/nginx -p /data/app/game_platform_union/ -c conf/nginx.conf root 10026 0.0 0.2 539688 30748 ? S 18:31 0:00 nginx: worker process root 10027 0.0 0.2 539688 30752 ? S 18:31 0:00 nginx: worker process root 10028 0.3 0.2 539688 30752 ? S 18:31 0:00 nginx: worker process root 10029 0.3 0.2 539688 30732 ? S 18:31 0:00 nginx: worker process root 10281 0.0 0.0 112704 956 pts/3 S+ 18:31 0:00 grep --color=auto nginx [root@1_201 sh]# /opt/stapxx/samples/lj-lua-stacks.sxx --arg time=30 --skip-badvars -x 10026 > /tmp/tmp.bt Found exact match for libluajit: /data/app/openresty/luajit/lib/libluajit-5.1.so.2.1.0 WARNING: Start tracing 10026 (/data/app/openresty/nginx/sbin/nginx) WARNING: Please wait for 30 seconds... WARNING: Time's up. Quitting now... WARNING: Found 444 JITted samples. [root@1_201 sh]# /opt/openresty-systemtap-toolkit/fix-lua-bt /tmp/tmp.bt > /tmp/10026.bt failed to open lua/resty/utils/db/mysql_help.lua for reading: No such file or directory

[root@1_201 sh]# ls -l /data/app/game_platform_union/lua/resty/utils/db/mysql_help.lua -rwxr-xr-x 1 root root 6866 Jul 5 13:45 /data/app/game_platform_union/lua/resty/utils/db/mysql_help.lua

Why does it hint that this does not exist?