monitoringartist / zabbix-docker-monitoring

:whale: Docker/Kubernetes/Mesos/Marathon/Chronos/LXC/LXD/Swarm container monitoring - Docker image, Zabbix template and C module
https://hub.docker.com/r/monitoringartist/zabbix-agent-xxl-limited/
GNU General Public License v2.0
1.19k stars 268 forks source link

agent crash with compiled module 3.4.10-3.4.15 #110

Open alexmirtoff opened 5 years ago

alexmirtoff commented 5 years ago
cat /etc/os-release 
NAME="SLES"
VERSION="12-SP3"
VERSION_ID="12.3"
PRETTY_NAME="SUSE Linux Enterprise Server 12 SP3"
docker version
Client:
 Version:           18.09.0
 API version:       1.39
 Go version:        go1.10.4
 Git commit:        33a45cd
 Built:             Wed Nov  7 00:25:11 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Enterprise
 Engine:
  Version:          18.09.0
  API version:      1.39 (minimum version 1.12)
  Go version:       go1.10.4
  Git commit:       33a45cd
  Built:            Wed Nov  7 00:19:46 2018
  OS/Arch:          linux/amd64
  Experimental:     false
*** Error in `/usr/sbin/zabbix-agentd: listener #3 [processing request]': munmap_chunk(): invalid pointer: 0x00007f0fd32e4840 ***
======= Backtrace: =========
/lib64/libc.so.6(+0x740ef)[0x7f0fd3b9e0ef]
/lib64/libc.so.6(+0x79646)[0x7f0fd3ba3646]
/usr/lib/modules/zabbix_module_docker.so(zbx_module_docker_net+0x636)[0x7f0fd3704cb3]
/usr/sbin/zabbix-agentd: listener #3 [processing request](process+0x353)[0x4185e3]
/usr/sbin/zabbix-agentd: listener #3 [processing request](listener_thread+0x1ad)[0x41513d]
/usr/sbin/zabbix-agentd: listener #3 [processing request](zbx_thread_start+0x3e)[0x42c45e]
/usr/sbin/zabbix-agentd: listener #3 [processing request](MAIN_ZABBIX_ENTRY+0x2c3)[0x417883]
/usr/sbin/zabbix-agentd: listener #3 [processing request](daemon_start+0x1a9)[0x42cf09]
/usr/sbin/zabbix-agentd: listener #3 [processing request](main+0x9e)[0x40d1fe]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f0fd3b4a725]
/usr/sbin/zabbix-agentd: listener #3 [processing request](_start+0x29)[0x40d309]
======= Memory map: ========
00400000-00457000 r-xp 00000000 fe:01 1052                               /usr/sbin/zabbix-agentd
00656000-00657000 r--p 00056000 fe:01 1052                               /usr/sbin/zabbix-agentd
00657000-00659000 rw-p 00057000 fe:01 1052                               /usr/sbin/zabbix-agentd
00659000-0065e000 rw-p 00000000 00:00 0 
00e22000-00e43000 rw-p 00000000 00:00 0                                  [heap]
00e43000-00e47000 rw-p 00000000 00:00 0                                  [heap]
7f0fd30cb000-7f0fd30e1000 r-xp 00000000 fe:00 299                        /lib64/libgcc_s.so.1
7f0fd30e1000-7f0fd32e0000 ---p 00016000 fe:00 299                        /lib64/libgcc_s.so.1
7f0fd32e0000-7f0fd32e1000 r--p 00015000 fe:00 299                        /lib64/libgcc_s.so.1
7f0fd32e1000-7f0fd32e2000 rw-p 00016000 fe:00 299                        /lib64/libgcc_s.so.1
7f0fd32e2000-7f0fd32e7000 r-xp 00000000 fe:00 350                        /lib64/libnss_dns-2.22.so
7f0fd32e7000-7f0fd34e6000 ---p 00005000 fe:00 350                        /lib64/libnss_dns-2.22.so
7f0fd34e6000-7f0fd34e7000 r--p 00004000 fe:00 350                        /lib64/libnss_dns-2.22.so
7f0fd34e7000-7f0fd34e8000 rw-p 00005000 fe:00 350                        /lib64/libnss_dns-2.22.so
7f0fd34e8000-7f0fd34f3000 r-xp 00000000 fe:00 1191                       /lib64/libnss_files-2.22.so
7f0fd34f3000-7f0fd36f2000 ---p 0000b000 fe:00 1191                       /lib64/libnss_files-2.22.so
7f0fd36f2000-7f0fd36f3000 r--p 0000a000 fe:00 1191                       /lib64/libnss_files-2.22.so
7f0fd36f3000-7f0fd36f4000 rw-p 0000b000 fe:00 1191                       /lib64/libnss_files-2.22.so
7f0fd36f4000-7f0fd36fa000 rw-p 00000000 00:00 0 
7f0fd36fa000-7f0fd370c000 r-xp 00000000 fe:01 1182                       /usr/lib/modules/zabbix_module_docker.so
7f0fd370c000-7f0fd390b000 ---p 00012000 fe:01 1182                       /usr/lib/modules/zabbix_module_docker.so
7f0fd390b000-7f0fd390c000 r--p 00011000 fe:01 1182                       /usr/lib/modules/zabbix_module_docker.so
7f0fd390c000-7f0fd390d000 rw-p 00012000 fe:01 1182                       /usr/lib/modules/zabbix_module_docker.so
7f0fd390d000-7f0fd3925000 r-xp 00000000 fe:00 1631                       /lib64/libpthread-2.22.so
7f0fd3925000-7f0fd3b24000 ---p 00018000 fe:00 1631                       /lib64/libpthread-2.22.so
7f0fd3b24000-7f0fd3b25000 r--p 00017000 fe:00 1631                       /lib64/libpthread-2.22.so
7f0fd3b25000-7f0fd3b26000 rw-p 00018000 fe:00 1631                       /lib64/libpthread-2.22.so
7f0fd3b26000-7f0fd3b2a000 rw-p 00000000 00:00 0 
7f0fd3b2a000-7f0fd3cc5000 r-xp 00000000 fe:00 140                        /lib64/libc-2.22.so
7f0fd3cc5000-7f0fd3ec5000 ---p 0019b000 fe:00 140                        /lib64/libc-2.22.so
7f0fd3ec5000-7f0fd3ec9000 r--p 0019b000 fe:00 140                        /lib64/libc-2.22.so
7f0fd3ec9000-7f0fd3ecb000 rw-p 0019f000 fe:00 140                        /lib64/libc-2.22.so
7f0fd3ecb000-7f0fd3ecf000 rw-p 00000000 00:00 0 
7f0fd3ecf000-7f0fd3f3d000 r-xp 00000000 fe:01 1207                       /usr/lib64/libpcre.so.1.2.7
7f0fd3f3d000-7f0fd413c000 ---p 0006e000 fe:01 1207                       /usr/lib64/libpcre.so.1.2.7
7f0fd413c000-7f0fd413d000 r--p 0006d000 fe:01 1207                       /usr/lib64/libpcre.so.1.2.7
7f0fd413d000-7f0fd413e000 rw-p 0006e000 fe:01 1207                       /usr/lib64/libpcre.so.1.2.7
7f0fd413e000-7f0fd4152000 r-xp 00000000 fe:00 1659                       /lib64/libresolv-2.22.so
7f0fd4152000-7f0fd4351000 ---p 00014000 fe:00 1659                       /lib64/libresolv-2.22.so
7f0fd4351000-7f0fd4352000 r--p 00013000 fe:00 1659                       /lib64/libresolv-2.22.so
7f0fd4352000-7f0fd4353000 rw-p 00014000 fe:00 1659                       /lib64/libresolv-2.22.so
7f0fd4353000-7f0fd4355000 rw-p 00000000 00:00 0 
7f0fd4355000-7f0fd4357000 r-xp 00000000 fe:00 306                        /lib64/libdl-2.22.so
7f0fd4357000-7f0fd4557000 ---p 00002000 fe:00 306                        /lib64/libdl-2.22.so
7f0fd4557000-7f0fd4558000 r--p 00002000 fe:00 306                        /lib64/libdl-2.22.so
7f0fd4558000-7f0fd4559000 rw-p 00003000 fe:00 306                        /lib64/libdl-2.22.so
7f0fd4559000-7f0fd4654000 r-xp 00000000 fe:00 328                        /lib64/libm-2.22.so
7f0fd4654000-7f0fd4854000 ---p 000fb000 fe:00 328                        /lib64/libm-2.22.so
7f0fd4854000-7f0fd4855000 r--p 000fb000 fe:00 328                        /lib64/libm-2.22.so
7f0fd4855000-7f0fd4856000 rw-p 000fc000 fe:00 328                        /lib64/libm-2.22.so
7f0fd4856000-7f0fd4877000 r-xp 00000000 fe:00 38                         /lib64/ld-2.22.so
7f0fd4a07000-7f0fd4a61000 rw-s 00000000 00:05 229376                     /SYSV00000000 (deleted)
7f0fd4a61000-7f0fd4a66000 rw-p 00000000 00:00 0 
7f0fd4a75000-7f0fd4a76000 rw-p 00000000 00:00 0 
7f0fd4a76000-7f0fd4a77000 rw-p 00000000 00:00 0 
7f0fd4a77000-7f0fd4a78000 r--p 00021000 fe:00 38                         /lib64/ld-2.22.so
7f0fd4a78000-7f0fd4a79000 rw-p 00022000 fe:00 38                         /lib64/ld-2.22.so
7f0fd4a79000-7f0fd4a7a000 rw-p 00000000 00:00 0 
7ffda273c000-7ffda275d000 rw-p 00000000 00:00 0                          [stack]
7ffda279f000-7ffda27a2000 r--p 00000000 00:00 0                          [vvar]
7ffda27a2000-7ffda27a4000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]
 50660:20181212:134920.403 found metric TX-OK: 44000823
 50660:20181212:134920.403 Sending back [44000823]
 50660:20181212:134920.404 __zbx_zbx_setproctitle() title:'listener #1 [waiting for connection]'
 50658:20181212:134920.406 One child process died (PID:50662,exitcode/signal:6). Exiting ...
 50658:20181212:134920.406 zbx_on_exit() called
 50659:20181212:134920.406 Got signal [signal:15(SIGTERM),sender_pid:50658,sender_uid:0,reason:0]. Exiting ...
 50663:20181212:134920.406 Got signal [signal:15(SIGTERM),sender_pid:50658,sender_uid:0,reason:0]. Exiting ...
 50660:20181212:134920.406 Got signal [signal:15(SIGTERM),sender_pid:50658,sender_uid:0,reason:0]. Exiting ...
 50661:20181212:134920.407 Got signal [signal:15(SIGTERM),sender_pid:50658,sender_uid:0,reason:0]. Exiting ...
zabbix-agentd [50658]: Error waiting for process with PID 50662: [10] No child processes
 50658:20181212:134920.407 In zbx_dshm_destroy() shmid:-1
 50658:20181212:134920.407 End of zbx_dshm_destroy():SUCCEED
 50658:20181212:134920.407 In zbx_unload_modules()
 50658:20181212:134920.407 In zbx_module_uninit()
 50658:20181212:134920.408 End of zbx_unload_modules()
 50658:20181212:134920.408 Zabbix Agent stopped. Zabbix 3.4.15 (revision 86739).
jangaraj commented 5 years ago

Did you compile the module for your system with correct Zabbix version? Could you provide more logs before backtrace, please?

alexmirtoff commented 5 years ago

Did you compile the module for your system with correct Zabbix version? Could you provide more logs before backtrace, please?

Yes. I have compiled the correct version. These hosts are running as manager-worker mode. On the host without a cluster everything is fine. Some logs:

 65158:20181212:145638.254 Requested [docker.mem[d8dc14731a9ca28c2c8f4b2c3063db03f752b751ef40bf17d0c07e169a3e2918,total_cache]]
 65158:20181212:145638.254 In zbx_module_docker_mem()
 65158:20181212:145638.254 In zbx_module_docker_get_fci()
 65158:20181212:145638.254 Original full container id will be used
 65158:20181212:145638.254 Metric source file: /sys/fs/cgroup/memory/docker/d8dc14731a9ca28c2c8f4b2c3063db03f752b751ef40bf17d0c07e169a3e2918/memory.stat
 65158:20181212:145638.254 Looking metric total_cache in memory.stat file
 65158:20181212:145638.254 Id: d8dc14731a9ca28c2c8f4b2c3063db03f752b751ef40bf17d0c07e169a3e2918; metric: total_cache; value: 77406208
 65158:20181212:145638.254 Sending back [77406208]
 65158:20181212:145638.255 __zbx_zbx_setproctitle() title:'listener #3 [waiting for connection]'
 65158:20181212:145638.256 __zbx_zbx_setproctitle() title:'listener #3 [processing request]'
 65158:20181212:145638.257 Requested [docker.mem[3fd2b78b602d02a879dffb33a0073725d38dc04c48959a50b5b115dae7feba9b,total_rss]]
 65158:20181212:145638.257 In zbx_module_docker_mem()
 65158:20181212:145638.257 In zbx_module_docker_get_fci()
 65158:20181212:145638.257 Original full container id will be used
 65158:20181212:145638.257 Metric source file: /sys/fs/cgroup/memory/docker/3fd2b78b602d02a879dffb33a0073725d38dc04c48959a50b5b115dae7feba9b/memory.stat
 65158:20181212:145638.258 Looking metric total_rss in memory.stat file
 65158:20181212:145638.258 Id: 3fd2b78b602d02a879dffb33a0073725d38dc04c48959a50b5b115dae7feba9b; metric: total_rss; value: 73814016
 65158:20181212:145638.258 Sending back [73814016]
 65158:20181212:145638.258 __zbx_zbx_setproctitle() title:'listener #3 [waiting for connection]'
 65157:20181212:145638.260 __zbx_zbx_setproctitle() title:'listener #2 [processing request]'
 65157:20181212:145638.261 Requested [docker.mem[598fb024a76008b3919ba2debe37319d9a96d2a90dce521f23b6dc7c3dd2a648,total_swap]]
 65157:20181212:145638.261 In zbx_module_docker_mem()
 65157:20181212:145638.261 In zbx_module_docker_get_fci()
 65157:20181212:145638.261 Original full container id will be used
 65157:20181212:145638.261 Metric source file: /sys/fs/cgroup/memory/docker/598fb024a76008b3919ba2debe37319d9a96d2a90dce521f23b6dc7c3dd2a648/memory.stat
 65157:20181212:145638.261 Cannot open metric file: '/sys/fs/cgroup/memory/docker/598fb024a76008b3919ba2debe37319d9a96d2a90dce521f23b6dc7c3dd2a648/memory.stat'
 65157:20181212:145638.261 Sending back [ZBX_NOTSUPPORTED: Cannot open memory.stat file]
 65157:20181212:145638.261 __zbx_zbx_setproctitle() title:'listener #2 [waiting for connection]'
 65157:20181212:145638.263 __zbx_zbx_setproctitle() title:'listener #2 [processing request]'
 65157:20181212:145638.264 Requested [docker.up[f508d3c86f17820bf51dea6517045a1ce6dddc457d53ec397c61309ecd6b090e]]
 65157:20181212:145638.264 In zbx_module_docker_up()
 65157:20181212:145638.264 In zbx_module_docker_get_fci()
 65157:20181212:145638.264 Original full container id will be used
 65157:20181212:145638.264 Metric source file: /sys/fs/cgroup/cpu,cpuacct/docker/f508d3c86f17820bf51dea6517045a1ce6dddc457d53ec397c61309ecd6b090e/cpuacct.stat
 65157:20181212:145638.264 Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/docker/f508d3c86f17820bf51dea6517045a1ce6dddc457d53ec397c61309ecd6b090e/cpuacct.stat', container doesn't run
 65157:20181212:145638.264 Sending back [0]
 65157:20181212:145638.264 __zbx_zbx_setproctitle() title:'listener #2 [waiting for connection]'
 65157:20181212:145638.266 __zbx_zbx_setproctitle() title:'listener #2 [processing request]'
 65157:20181212:145638.267 Requested [docker.xnet[f3a1997592d3b0dc7cad00e834759e8f699e9e96108d5d6dc0c3d5afe38701a3,eth0,RX-OK]]
 65157:20181212:145638.267 In zbx_module_docker_net()
 65157:20181212:145638.267 In zbx_module_docker_get_fci()
 65157:20181212:145638.267 Original full container id will be used
 65157:20181212:145638.267 netns file: /var/run/netns/zabbix_module_docker_f3a1997592d3b0dc7cad00e834759e8f699e9e96108d5d6dc0c3d5afe38701a3
 65157:20181212:145638.267 Tasks file: /sys/fs/cgroup/devices/docker/f3a1997592d3b0dc7cad00e834759e8f699e9e96108d5d6dc0c3d5afe38701a3/tasks
 65157:20181212:145638.267 Cannot open Docker tasks file: '/sys/fs/cgroup/devices/docker/f3a1997592d3b0dc7cad00e834759e8f699e9e96108d5d6dc0c3d5afe38701a3/tasks'
 65157:20181212:145638.267 Sending back [ZBX_NOTSUPPORTED: Cannot open Docker tasks file]
 65157:20181212:145638.267 __zbx_zbx_setproctitle() title:'listener #2 [waiting for connection]'
 65157:20181212:145638.273 __zbx_zbx_setproctitle() title:'listener #2 [processing request]'
 65157:20181212:145638.274 Requested [docker.xnet[71b227a3c00d0b6862cd82187d9bcd68be4698ece453bc90c3ff8dd6bc3b6f26,eth0,RX-OK]]
 65157:20181212:145638.274 In zbx_module_docker_net()
 65157:20181212:145638.274 In zbx_module_docker_get_fci()
 65157:20181212:145638.274 Original full container id will be used
 65157:20181212:145638.274 netns file: /var/run/netns/zabbix_module_docker_71b227a3c00d0b6862cd82187d9bcd68be4698ece453bc90c3ff8dd6bc3b6f26
 65157:20181212:145638.274 Tasks file: /sys/fs/cgroup/devices/docker/71b227a3c00d0b6862cd82187d9bcd68be4698ece453bc90c3ff8dd6bc3b6f26/tasks
*** Error in `/usr/sbin/zabbix-agentd: listener #2 [processing request]': munmap_chunk(): invalid pointer: 0x00007f9ea11ce840 ***
jangaraj commented 5 years ago

Problem is with docker.xnet. Did you fulfill requirements mentioned in the Readme?

Note 1: Root permissions (AllowRoot=1) are required, because net namespaces (/var/run/netns/) are created/used Note 2: netstat is needed to be installed and available in PATH

alexmirtoff commented 5 years ago
  1. AllowRoot=1 is set
  2. Netstat is installed and available.

Some network data appeared in Zabbix before the agent died.

jangaraj commented 5 years ago

Probably it is crashing somewhere in this part https://github.com/monitoringartist/zabbix-docker-monitoring/blob/dba2fb727e411493bcc4e540d5bac681836d12fc/src/modules/zabbix_module_docker/zabbix_module_docker.c#L1286-L1306

Probably some pointer for free function is not valid. It will require deeper investigation to prove it.

forum77alive commented 4 years ago

I also have this problem, but I have zabbix-agent version 4.4.3. Debian 9.9

forum77alive commented 4 years ago

But I downgrade my zabbix-agent to 4.2.8 and compiled .so - it worked!

Lucefron commented 3 years ago

same problem: os: ubuntu 18.04, debian 9, debian 10; agent version: 5.0.12 zabbix_module_docker.so was downloaded from master branch.

i-ky commented 2 years ago

It looks to me that the problem is here: https://github.com/monitoringartist/zabbix-docker-monitoring/blob/fd3f6e818e31989972f15fbe86079573fc1c6608/src/modules/zabbix_module_docker/zabbix_module_docker.c#L1274-L1280 If fgets() fails, then loop body is never executed and first_task is not initialized and subsequent attempt to release memory: https://github.com/monitoringartist/zabbix-docker-monitoring/blob/fd3f6e818e31989972f15fbe86079573fc1c6608/src/modules/zabbix_module_docker/zabbix_module_docker.c#L1290 ...will lead to a crash.

The solution would be to convert this while loop into if else construct. However, I don't know what to put in else branch, because I am looking at it purely from C developer's perspective. @jangaraj and the rest, what does it mean if Tasks file is empty? How should module behave in this case?