Closed skokhanovskiy closed 5 years ago
Try to simulate agent communication with Docker API (ping) in cmd, pls:
curl --unix-socket /var/run/docker.sock http:/_ping
Guess that you mean http://localhost/_ping
in curl command line. Here it is:
# docker --version
Docker version 18.09.0, build 4d60db4
# curl -v --unix-socket /var/run/docker.sock http://localhost/_ping
* Trying /var/run/docker.sock...
* Connected to localhost (/var/run/docker.sock) port 80 (#0)
> GET /_ping HTTP/1.1
> Host: localhost
> User-Agent: curl/7.52.1
> Accept: */*
>
< HTTP/1.1 200 OK
< Api-Version: 1.39
< Docker-Experimental: false
< Ostype: linux
< Server: Docker/18.09.5 (linux)
< Date: Mon, 13 May 2019 12:31:37 GMT
< Content-Length: 2
< Content-Type: text/plain; charset=utf-8
<
* Curl_http_done: called premature == 0
* Connection #0 to host localhost left intact
Once again draw attention to the fact that restarting of the zabbix-agent service fixes the problem when the docker is already running and loaded. The described behavior occurs only when the server is booting.
~To workaround this bug i add a delay between starting docker and zabbix-agent services. I added a timer module to the systemd that starts the zabbix-agent service after 15 seconds after starting docker.~ To workaround this issue I change configuration of the zabbix-agent systemd unit. This changes boot order and systemd on boot starts zabbix-agent only when docker service is already running.
$ cat /etc/systemd/system/zabbix-agent.service.d/docker.conf
[Unit]
Wants=docker.service
After=docker.service
# systemctl daemon-reload
This helps, but the error in the module logic is still there.
Thank you for more details. Will you be able to create pull request, which will fix broken module logic, please? It looks like a problem with socket timeout.
@jangaraj #127 should fix this issue.
853:20190604:093448.726 Starting Zabbix Agent [orn-runners-01]. Zabbix 4.0.8 (revision 2b50c941de).
853:20190604:093448.726 **** Enabled features ****
853:20190604:093448.726 IPv6 support: YES
853:20190604:093448.726 TLS support: YES
853:20190604:093448.726 **************************
853:20190604:093448.726 using configuration file: /etc/zabbix/zabbix_agentd.conf
853:20190604:093448.726 In zbx_load_modules()
853:20190604:093448.726 loading module "/usr/lib/zabbix/modules/zabbix_module_docker.so"
853:20190604:093449.037 In zbx_module_api_version()
853:20190604:093449.037 In zbx_module_init()
853:20190604:093449.037 zabbix_module_docker v0.6.9, compilation time: Jun 4 2019 18:22:14
853:20190604:093449.037 In zbx_docker_dir_detect()
853:20190604:093449.037 Detected docker stat directory: /sys/fs/cgroup/
853:20190604:093449.037 Cannot detect used docker driver
853:20190604:093449.037 In zbx_docker_api_detect()
853:20190604:093449.037 In zbx_docker_perm()
853:20190604:093449.037 zabbix agent user has docker perm
853:20190604:093449.037 In zbx_module_docker_socket_query()
853:20190604:093449.037 Docker's socket query: GET /_ping HTTP/1.0
! 853:20190604:093519.298 Docker's socket response: [{}]
853:20190604:093519.298 Docker's socket doesn't work - only basic docker metrics are available
853:20190604:093519.298 In zbx_module_item_list()
853:20190604:093519.298 In zbx_module_item_timeout()
853:20190604:093519.298 cannot find "zbx_module_history_write_cbs()" function in module "zabbix_module_docker.so": /usr/lib/zabbix/modules/zabbix_module_docker.so: undefined symbol: zbx_module_histor
y_write_cbs
853:20190604:093519.298 loaded modules: zabbix_module_docker.so
Waiting for review.
Zabbix agent with the zabbix_module_docker.so module stucks on server reboot. Only manual restarting of the zabbix agent service helps in that cases.
Version of zabbix agent:
Version of zabbix-docker-monitoring: latest compiled from master.
Logs of stucked zabbix agent:
After that nothing happens for a long time.
I've tried add docker service as dependency for the zabbix agent service in systemd.
~But this didn't help.~ Look at https://github.com/monitoringartist/zabbix-docker-monitoring/issues/121#issuecomment-491810760
I found that socket timeouts defined here: https://github.com/monitoringartist/zabbix-docker-monitoring/blob/27709c75b74e6404295b5b56b846b4e3b6d8f982/src/modules/zabbix_module_docker/zabbix_module_docker.c#L172-L180 For timeouts values used fields form the
stimeout
struct that initialized in thezbx_module_item_timeout
function: https://github.com/monitoringartist/zabbix-docker-monitoring/blob/27709c75b74e6404295b5b56b846b4e3b6d8f982/src/modules/zabbix_module_docker/zabbix_module_docker.c#L105-L119 But this function called by zabbix agent after this query. NoIn zbx_module_item_timeout()
string in logs confirms my hunch. I think first ping query makes with zero (i.e. infinitely) timeout and this request is infinitely executes in the not yet fully satarted docker.