monitoringartist / dockbix-agent-xxl

:whale: Dockerized Zabbix agent with Docker metrics and host metrics support for CoreOS, RHEL, CentOS, Ubuntu, Debian, Fedora, Boot2docker, Photon OS, Amazon Linux, ...
https://hub.docker.com/r/monitoringartist/dockbix-agent-xxl-limited/
Other
182 stars 54 forks source link

Cannot open metric file: '/sys/fs/cgroup/memory/system.slice #30

Closed lukastheblack closed 7 years ago

lukastheblack commented 7 years ago

I am setting up Zabbix server in our environment. Connections from the Zabbix server to non-docker Hosts/servers is working flawlessly. We are deploying the Dockbix agent via Kubernetes, and are unable to get stats from containers, and it is flooding our central logging Server.

The most prominent errors are

Cannot find the [Id] item in the Received JSON object
Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//*Insert Docker instance name here */memory.stat'  
Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/system.slice//*Insert Docker instance name here */cpuacct.stat

During startup of the Dockbix instance, it seems to correctly initialize and find it's configuration. It creates 6 Agents -

0 = main process

1 = Collector

2-4 = listener

5 = Active checks

I am finding it difficult to troubleshoot without access to the container via Docker exec, but I have exported the logs from the central logging server and doctored them to remove personally identifying information.


"Dockbix Agent XXL v0.0.1b limited version""," ,2017-06-19T08:59:25.542-0500,http:docker,_json
"{""line"":""Copyright (C) 2014-2017 Jan Garaj - www.monitoringartist.com""," ,2017-06-19T08:59:25.554-0500,http:docker,_json
"{""line"":""Freeware licence - Usage of this binary is restricted to official monitoringartist Docker images only.""," ,2017-06-19T08:59:25.559-0500,http:docker,_json
"{""line"":""Starting Zabbix Agent [Hostname]. Zabbix 3.2.4 Dockbix Agent XXL (2017-03-25) (revision ).""," ,2017-06-19T08:59:25.563-0500,http:docker,_json
"{""line"":""Press Ctrl+C to exit.""," ,2017-06-19T08:59:25.567-0500,http:docker,_json
"{""line"":""""," ,2017-06-19T08:59:25.572-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.556 Starting Zabbix Agent [Hostname]. Zabbix 3.2.4 Dockbix Agent XXL (2017-03-25) (revision ).""," ,2017-06-19T08:59:25.576-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.557 **** Enabled features ****""," ,2017-06-19T08:59:25.583-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.557 IPv6 support:          YES""," ,2017-06-19T08:59:25.588-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.557 TLS support:            NO""," ,2017-06-19T08:59:25.594-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.557 **************************""," ,2017-06-19T08:59:25.598-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.557 using configuration file: /etc/zabbix/zabbix_agentd.conf""," ,2017-06-19T08:59:25.604-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.562 loaded modules: zabbix_module_docker.so, zabbix_module_stress.so""," ,2017-06-19T08:59:25.604-0500,http:docker,_json
"{""line"":"" 10315:20170619:135925.563 agent #0 started [main process]""," ,2017-06-19T08:59:25.607-0500,http:docker,_json
"{""line"":"" 10316:20170619:135925.564 agent #1 started [collector]""," ,2017-06-19T08:59:25.609-0500,http:docker,_json
"{""line"":"" 10317:20170619:135925.564 agent #2 started [listener #1]""," ,2017-06-19T08:59:25.619-0500,http:docker,_json
"{""line"":"" 10318:20170619:135925.565 agent #3 started [listener #2]""," ,2017-06-19T08:59:25.627-0500,http:docker,_json
"{""line"":"" 10319:20170619:135925.565 agent #4 started [listener #3]""," ,2017-06-19T08:59:25.631-0500,http:docker,_json
"{""line"":"" 10320:20170619:135925.565 agent #5 started [active checks #1]""," ,2017-06-19T08:59:25.634-0500,http:docker,_json
"{""line"":"" 10320:20170619:135925.568 active check configuration update from [shp-corp-zab.corporate:10051] started to fail (cannot resolve [shp-corp-zab.corporate])""," ,2017-06-19T08:59:25.636-0500,http:docker,_json
"{""line"":"" 10317:20170619:140034.310 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:34.311-0500,http:docker,_json
"{""line"":"" 10317:20170619:140034.311 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:34.343-0500,http:docker,_json
"{""line"":"" 10318:20170619:140034.635 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:34.636-0500,http:docker,_json
"{""line"":"" 10318:20170619:140034.636 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance-904c-005056bd0e22_25a1f918/memory.stat'""," ,2017-06-19T09:00:34.638-0500,http:docker,_json
"{""line"":"" 10318:20170619:140034.653 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:34.653-0500,http:docker,_json
"{""line"":"" 10318:20170619:140034.653 Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/system.slice//$DockerInstance/cpuacct.stat'""," ,2017-06-19T09:00:34.656-0500,http:docker,_json
"{""line"":"" 10319:20170619:140035.713 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:35.713-0500,http:docker,_json
"{""line"":"" 10319:20170619:140035.713 Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/system.slice//DockerInstance/cpuacct.stat'""," ,2017-06-19T09:00:35.715-0500,http:docker,_json
"{""line"":"" 10318:20170619:140035.727 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:35.727-0500,http:docker,_json
"{""line"":"" 10318:20170619:140035.727 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:35.728-0500,http:docker,_json
"{""line"":"" 10317:20170619:140036.731 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:36.731-0500,http:docker,_json
"{""line"":"" 10317:20170619:140036.731 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:36.733-0500,http:docker,_json
"{""line"":"" 10318:20170619:140036.781 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:36.781-0500,http:docker,_json
"{""line"":"" 10318:20170619:140036.781 Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/system.slice//DockerInstance/cpuacct.stat'""," ,2017-06-19T09:00:36.783-0500,http:docker,_json
"{""line"":"" 10319:20170619:140037.840 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:37.841-0500,http:docker,_json
"{""line"":"" 10319:20170619:140037.840 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:37.842-0500,http:docker,_json
"{""line"":"" 10318:20170619:140037.841 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:37.851-0500,http:docker,_json
"{""line"":"" 10318:20170619:140037.841 Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/system.slice//DockerInstance/cpuacct.stat'""," ,2017-06-19T09:00:37.855-0500,http:docker,_json
"{""line"":"" 10319:20170619:140038.880 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:38.880-0500,http:docker,_json
"{""line"":"" 10319:20170619:140038.880 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:38.882-0500,http:docker,_json
"{""line"":"" 10317:20170619:140038.880 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:38.883-0500,http:docker,_json
"{""line"":"" 10317:20170619:140038.881 Cannot open metric file: '/sys/fs/cgroup/cpu,cpuacct/system.slice//DockerInstance/cpuacct.stat'""," ,2017-06-19T09:00:38.884-0500,http:docker,_json
"{""line"":"" 10318:20170619:140039.891 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:39.891-0500,http:docker,_json
"{""line"":"" 10318:20170619:140039.891 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice/docker-/dockbix.scope/memory.stat'""," ,2017-06-19T09:00:39.895-0500,http:docker,_json
"{""line"":"" 10317:20170619:140039.903 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:39.903-0500,http:docker,_json
"{""line"":"" 10317:20170619:140039.903 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:39.907-0500,http:docker,_json
"{""line"":"" 10319:20170619:140040.912 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:40.913-0500,http:docker,_json
"{""line"":"" 10319:20170619:140040.913 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:40.916-0500,http:docker,_json
"{""line"":"" 10317:20170619:140040.914 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:40.918-0500,http:docker,_json
"{""line"":"" 10317:20170619:140040.914 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:40.918-0500,http:docker,_json
"{""line"":"" 10317:20170619:140040.952 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:40.953-0500,http:docker,_json
"{""line"":"" 10318:20170619:140040.954 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:40.956-0500,http:docker,_json
"{""line"":"" 10318:20170619:140040.954 Cannot open metric file: '/sys/fs/cgroup/memory/system.slice//DockerInstance/memory.stat'""," ,2017-06-19T09:00:40.960-0500,http:docker,_json
"{""line"":"" 10317:20170619:140041.928 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:41.928-0500,http:docker,_json
"{""line"":"" 10318:20170619:140041.928 Cannot find the [Id] item in the received JSON object""," ,2017-06-19T09:00:41.932-0500,http:docker,_json
jangaraj commented 7 years ago

Sorry, I've asked you to follow https://github.com/monitoringartist/dockbix-agent-xxl#support Did you read it? You didn't provide logs with DebugLevel=5! Please save my time and read README first.

Closing, because use didn't read README.

lukastheblack commented 7 years ago

I set debug level to 5, see the pastebin link for logging of roughly 5 minutes of uptime.

https://pastebin.com/Wsy0w4va

jangaraj commented 7 years ago
   11:20170619:183658.072 In zbx_docker_dir_detect()
   11:20170619:183658.073 Cannot detect used docker driver

is your problem.

lukastheblack commented 7 years ago

Ok, what driver specifically are we talking about here? We are running docker version 1.12.6

jangaraj commented 7 years ago

Docker execution driver.

Are you able to run non dockerized zabbix agent with docker monitoring module on that host without any issue?

lukastheblack commented 7 years ago

Such as installing it via Yum? When I run docker info , I find 3 drivers listed, I am assuming these are NOT what you're talking about though. Storage Driver: devicemapper Logging Driver: splunk Cgroup Driver: systemd

jangaraj commented 7 years ago

Yes, use yum/apt ... to install non dockerized version.

lukastheblack commented 7 years ago

Yes. It starts correctly and is able to communicate with the Zabbix server.

3591:20170619:143559.350 Zabbix Agent stopped. Zabbix 2.0.20 (revision 64242). 6414:20170619:143559.379 Starting Zabbix Agent [shp-corp-shpyd01.swansonhealth.com]. Zabbix 2.0.20 (revision 64242). 6414:20170619:143559.379 In init_collector_data() 6414:20170619:143559.380 End of init_collector_data() 6415:20170619:143559.380 agent #0 started [collector] 6415:20170619:143559.380 In init_cpu_collector() 6415:20170619:143559.380 End of init_cpu_collector():SUCCEED 6415:20170619:143559.381 In update_cpustats() 6416:20170619:143559.381 agent #1 started [listener] 6417:20170619:143559.381 agent #2 started [listener] 6418:20170619:143559.381 agent #3 started [listener] 6415:20170619:143559.381 End of update_cpustats() 6419:20170619:143559.381 agent #4 started [active checks] 6419:20170619:143559.381 In init_active_metrics() 6419:20170619:143559.381 Buffer: first allocation for 100 elements 6419:20170619:143559.381 In send_buffer() host:'192.168.1.113' port:10051 values:0/100 6419:20170619:143559.381 End of send_buffer():SUCCEED 6419:20170619:143559.381 refresh_active_checks() host:'192.168.1.113' port:10051 6419:20170619:143559.382 sending [{ "request":"active checks", "host":"CENSORED"}] 6419:20170619:143559.382 before read 6419:20170619:143559.386 got [{"response":"success","data":[]}] 6419:20170619:143559.386 In parse_list_of_checks() 6419:20170619:143559.386 In disable_all_metrics() 6419:20170619:143559.387 In process_active_checks() server:'192.168.1.113' port:10051) 6419:20170619:143559.387 End of process_active_checks() 6419:20170619:143559.387 In get_min_nextcheck() 6419:20170619:143559.387 Sleeping for 1 second(s) 6415:20170619:143600.381 In update_cpustats() 6415:20170619:143600.382 End of update_cpustats() 6419:20170619:143600.387 In send_buffer() host:'192.168.1.113' port:10051 values:0/100 6419:20170619:143600.387 End of send_buffer():SUCCEED 6419:20170619:143600.387 Sleeping for 1 second(s) 6415:20170619:143601.382 In update_cpustats() 6415:20170619:143601.382 End of update_cpustats() 6419:20170619:143601.387 In send_buffer() host:'192.168.1.113' port:10051 values:0/100 6419:20170619:143601.387 End of send_buffer():SUCCEED 6419:20170619:143601.388 Sleeping for 1 second(s)

jangaraj commented 7 years ago

Again you are wasting my time: non dockerized zabbix agent with docker monitoring module. You didn't started agent with docker module. I'm not a free Zabbix support, so please follow my instructions carefully, otherwise any support for you must be paid.

lukastheblack commented 7 years ago

I am not trying to get support for Zabbix. The problem I have is with dockbix, which you wrote, correct? I have setup all the other Servers without issue and have them reporting back to Zabbix. I apologize but the way you worded the previous reply/comment is very difficult to parse. I will attempt to find the docker monitoring module.

lukastheblack commented 7 years ago

I have tried both the version provided on your page, and compiling the module myself, and I am getting an error "undefined symbol:zbx_alarm_timed_out", any thoughts on this? My version of Zabbix agent is 3.2 as is the version I downloaded/compiled.

28774:20170619:162301.307 Starting Zabbix Agent [Zabbix server]. Zabbix 3.2.0 (revision 62485). 28774:20170619:162301.307 Enabled features 28774:20170619:162301.307 IPv6 support: YES 28774:20170619:162301.307 TLS support: YES 28774:20170619:162301.307 ** 28774:20170619:162301.307 using configuration file: /etc/zabbix/zabbix_agentd.conf 28774:20170619:162301.307 In zbx_load_modules() 28774:20170619:162301.307 loading module "/var/lib/zabbix/modules//zabbix_module_docker.so" 28774:20170619:162301.307 cannot load module "zabbix_module_docker.so": /var/lib/zabbix/modules//zabbix_module_docker.so: undefined symbol: zbx_alarm_timed_out 28774:20170619:162301.307 End of zbx_load_modules():FAIL 28774:20170619:162301.307 loading modules failed, exiting...

jangaraj commented 7 years ago

Please use package provided by Zabbix (http://repo.zabbix.com/), don't use Zabbix package provided by your Linux distribution.

lukastheblack commented 7 years ago

I used this method to install - https://www.zabbix.com/documentation/3.2/manual/installation/install_from_packages/repository_installation- which appears to point to the same location, having added it as an external repository. See version below, vs the link I followed from yours, this should be the same version.

http://repo.zabbix.com/zabbix/3.2/rhel/7/x86_64/zabbix-agent-3.2.0-1.el7.x86_64.rpm

Installed Packages Name : zabbix-agent Arch : x86_64 Version : 3.2.6 Release : 1.el7 Size : 1.3 M Repo : installed From repo : zabbix Summary : Zabbix Agent URL : http://www.zabbix.com/ License : GPLv2+ Description : Zabbix agent to be installed on monitored systems.

jangaraj commented 7 years ago

Dunno. Pls use 3.2.6 http://repo.zabbix.com/zabbix/3.2/rhel/7/x86_64/zabbix-agent-3.2.6-1.el7.x86_64.rpm

lukastheblack commented 7 years ago

zabbix_agentd --version zabbix_agentd (daemon) (Zabbix) 3.2.6 Revision 67849 4 May 2017, compilation time: May 6 2017 00:30:54

Copyright (C) 2017 Zabbix SIA License GPLv2+: GNU GPL version 2 or later http://gnu.org/licenses/gpl.html. This is free software: you are free to change and redistribute it according to the license. There is NO WARRANTY, to the extent permitted by law.

lukastheblack commented 7 years ago

I was able to start Zabbix agent with the Docker module after confirming the version above, and I am attaching the logs. https://pastebin.com/0wgYp0EC

jangaraj commented 7 years ago

No problem there. Now run plain dockbix container (see README https://github.com/monitoringartist/dockbix-agent-xxl, no Kubernetes) in debug mode and provide logs again please?

lukastheblack commented 7 years ago

Got it, seemed to start ok with the following command docker run --name=zabbix-agent-xxl -h $(hostname) --privileged -p 10050:10050 -v /:/rootfs -v /var/run:/var/run -e "ZA_DebugLevel=5" -e "ZA_Server=$IP OF SERVER" --log-driver journald -d monitoringartist/dockbix-agent-xxl-limited:latest

Logs : https://pastebin.com/cw5x3hjk

jangaraj commented 7 years ago

Standalone zabbix agent + plain Dockbix container:

  1588:20170620:091653.323 In zbx_docker_dir_detect()
  1588:20170620:091653.323 Detected docker stat directory: /sys/fs/cgroup/
  1588:20170620:091653.323 Detected used docker driver dir: system.slice/
  1588:20170620:091653.323 Detected systemd docker - prefix/suffix will be used
  1588:20170620:091653.323 Detected JoinController cpu,cpuacct

=> stat cgroup directory is detected without problem

Your Dockbix container managed by Kubernetes:

  11:20170619:183658.072 In zbx_docker_dir_detect()
   11:20170619:183658.073 Cannot detect used docker driver

=> module is not able to detect stat cgroup directory.

=> problem of your Kubernetes configuration. It's not a issue of Dockbix project or zabbix docker module. Unfortunately Kubernetes orchestration is not in the scope of this project.

Yes, I can probably help you also with Dockix deployment on Kubernetes. But it's very specific, so it must be paid support.

lukastheblack commented 7 years ago

I am not sure what happened but the driver loaded correctly after I set ZA_Debug to 5, nothing else changed. I am noticing that I am getting a lot of the previous metric errors on Dockbix containers, but not on others. Is there an option to filter these hosts from discovery within the Dockbix configuration? I know how to filter them from Zabbix but that will not stop the logging messages, which is a concern for our environment. Let me know if you have any thoughts, thank you.

jangaraj commented 7 years ago

Nope. See Zabbix doc how LLD works.

lukastheblack commented 7 years ago

Thank you for your assistance in this matter. Please close this if you have not already.