Closed wbrown64 closed 2 years ago
I can also confirm I am seeing this using Agent(1) RHEL 5.0 image. and deploying through the kubernetes.yaml file found in the 5.0 branch of this repository.
Could you show us pod's logs?
Pod logs don't show much:
| 9205:20220421:221148.129 End of send_buffer():SUCCEED │ │ 9205:20220421:221148.129 zbx_setproctitle() title:'active checks #1 [idle 1 sec]' │ │ 9022:20220421:221148.669 Got signal [signal:15(SIGTERM),sender_pid:8972,sender_uid:1997,reason:0]. Exiting ... │ │ 9022:20220421:221148.669 zbx_on_exit() called │ │ 9022:20220421:221148.670 In zbx_dshm_destroy() shmid:-1 │ │ 9022:20220421:221148.670 End of zbx_dshm_destroy():SUCCEED │ │ 9022:20220421:221148.670 In zbx_unload_modules() │ │ 9022:20220421:221148.670 End of zbx_unload_modules() │ │ 9022:20220421:221148.670 Zabbix Agent stopped. Zabbix 5.0.22 (revision 90ee9e3).
This is mainly because it is a kubernetes termination, rather than an application error/failure. My above error in OP came out of describing the pod(s).
please try to catch full log :-)
│ [WARN tini (21035)] Tini is not running as PID 1 and isn't registered as a child subreaper. │ │ Zombie processes will not be re-parented to Tini, so zombie reaping won't work. │ │ To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1. │ │ Preparing Zabbix agent │ │ Preparing Zabbix agent configuration file │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "PidFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogType": 'console'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogFileSize": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "DebugLevel": '3'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "SourceIP": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogRemoteCommands": '1'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Server": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ListenPort": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ListenIP": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ListenBacklog": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "StartAgents": '0'...added │ │ Using 'x.x.x.x:10051' servers for active checks │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ServerActive": 'x.x.x.x:10051'...updated │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostInterface": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostInterfaceItem": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Hostname": 'ip-x.x.x.x.region-x.compute.internal'...updated │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostnameItem": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostMetadata": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostMetadataItem": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "RefreshActiveChecks": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "BufferSend": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "BufferSize": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "MaxLinesPerSecond": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Timeout": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Include": '/etc/zabbix/zabbix_agentd.d/'...added first occurrence │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "UnsafeUserParameters": '0'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LoadModulePath": '/var/lib/zabbix/modules/'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSConnect": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSAccept": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCAFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCRLFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSServerCertIssuer": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSServerCertSubject": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCertFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherAll": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherAll13": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherCert": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherCert13": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherPSK": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherPSK13": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSKeyFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSPSKIdentity": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSPSKFile": ''...removed │ │ * Updating '/etc/zabbix/zabbix_agentd.conf' parameter "DenyKey": 'system.run[]'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "User": 'zabbix'...added │ │ Starting Zabbix Agent [ip-x.x.x.x.region-x.compute.internal]. Zabbix 5.0.22 (revision 90ee9e3). │ │ Press Ctrl+C to exit. │ │ 21082:20220421:221848.835 Starting Zabbix Agent [ip-x.x.x.x.region-x]. Zabbix 5.0.22 (revision 90ee9e3). │ │ 21082:20220421:221848.835 ** Enabled features │ │ 21082:20220421:221848.835 IPv6 support: YES │ │ 21082:20220421:221848.835 TLS support: YES │ │ 21082:20220421:221848.835 ** │ │ 21082:20220421:221848.835 using configuration file: /etc/zabbix/zabbix_agentd.conf │ │ 21082:20220421:221848.835 agent #0 started [main process] │ │ 21209:20220421:221848.924 agent #1 started [collector] │ │ 21210:20220421:221848.924 agent #2 started [active checks #1] │ │ 21082:20220421:221857.538 Got signal [signal:15(SIGTERM),sender_pid:21035,sender_uid:1997,reason:0]. Exiting ... │ │ 21082:20220421:221857.538 Zabbix Agent stopped. Zabbix 5.0.22 (revision 90ee9e3).
StartAgents = 0? Please try to increase to at least 1.
why? We do not wish to use passive checks.
otherwise Zabbix agent will not listen 10050 port. If you do not want it, please, try to use different way how to check Zabbix agent is running or not. For example stats port 31999.
Will check this out later and update.. Thank you!
Still getting the same connection refused on port 31999.
At the very least isn't this a bug in HELM deployment? The health probes that are defaulting to port 10050 are relying on the assumption that passive mode is enabled.
Please pass "ZBX_ENABLESTATUSPORT=true" to enable the port.
Looks like that is not working. I am passing this as env variable
Environment: │ │ ZBX_ENABLESTATUSPORT: true
For me it works:
** Preparing Zabbix agent
** Preparing Zabbix agent configuration file
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "PidFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogType": 'console'...added
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogFileSize": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "DebugLevel": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "SourceIP": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogRemoteCommands": ''...removed
** Using 'zabbix-server' servers for passive checks
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Server": 'zabbix-server'...updated
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "ListenPort": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "ListenIP": ''...removed
** Using 'zabbix-server:10051' servers for active checks
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "ServerActive": 'zabbix-server:10051'...updated
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "EnablePersistentBuffer": '0'...added
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "StatusPort": '31999'...added
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostInterface": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostInterfaceItem": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Hostname": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostnameItem": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostMetadata": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostMetadataItem": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "RefreshActiveChecks": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "BufferSend": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "BufferSize": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "MaxLinesPerSecond": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Timeout": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Include": '/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf'...updated
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Include": '/etc/zabbix/zabbix_agentd.d/'...added first occurrence
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "UnsafeUserParameters": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSConnect": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSAccept": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSCAFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSCRLFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSServerCertIssuer": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSServerCertSubject": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSCertFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSKeyFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSPSKIdentity": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSPSKFile": ''...removed
2022/04/22 20:33:36.762571 Starting Zabbix Agent 2 (6.0.3)
env is:
"Env": [
"ZBX_ENABLESTATUSPORT=true",
"PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"TERM=xterm",
"ZBX_VERSION=6.0.3",
"ZBX_SOURCES=https://git.zabbix.com/scm/zbx/zabbix.git"
],
Should that variable work on agent1?
No. I thought we are talking about agent2 :-)
We were, but I had been playing around with agent1 in meantime. All good info though. Do you have an alternative solution for Agent1? Thanks for the help so far.
Currently, I think, only 10050 port or PID file (but this one is not good way)...
Hmm. So I can confirm that setting the ZBX_ENABLESTATUSPORT=true , along with port 31999 is working on agent2. However I would still ask if you can help me find a solution for agent1, as the current deployment spec listed here https://github.com/zabbix/zabbix-docker/blob/5.0/kubernetes.yaml for agent daemonset will not work if passive mode is disabled. Which should be an accepted use case.
Enable passive mode with some dummy IP address of Zabbix server :-)
Hate to be difficult, but with that config I am still getting connection refused on agent1, 5.0
containers:
- env:
- name: ZBX_DEBUGLEVEL
value: "3"
- name: ZBX_DENYKEY
value: system.run[*]
- name: ZBX_ALLOWKEY
- name: ZBX_LOGREMOTECOMMANDS
value: "1"
- name: ZBX_SERVER_HOST
value: x.x.x.x
- name: ZBX_PASSIVE_ALLOW
value: "true"
- name: ZBX_PASSIVESERVERS
value: 0.0.0.0/0
- name: ZBX_ACTIVE_ALLOW
value: "true"
- name: ZBX_STARTAGENTS
value: "0"
- name: ZBX_HOSTNAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
image: xxx-zabbix/zabbix-agent-1:5.0
imagePullPolicy: Always
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 2
successThreshold: 1
tcpSocket:
port: 10050
timeoutSeconds: 1
name: zabbix-agent
ports:
- containerPort: 10050
hostPort: 10050
name: zabbix-agent
protocol: TCP
error:
Liveness probe failed: dial tcp x.x.x.x:10050: connect: connection refused
ZBX_STARTAGENTS = 1
Ok, this will work for us. Thank you very much @dotneft, appreciate your help on this!
SUMMARY
I am deploying Zabbix agent2 5.0 through Kubernetes as documented here https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.2. We are not using a proxy server at the moment, rather for this POC having the agent send data directly to our Zabbix server. The agent will spin up and run, that is until the liveness and startup probes are enacted, when they run a TCP check on port 10050, which fails and restarts the pod, see below error.
After doing some tweaks, we noticed that when setting passive mode to true, the startup and liveliness probe succeeds. We are guessing because then the container is opening up / actively listening on 10050. Note that this port is defined in the Kubernetes Daemonset definition, and as far as we can configure should be open. (See below .yaml definition).
It should be noted that the agent does connect to our Zabbix server (5.0.22), and sends data before getting killed by k8s startup probe. It also will succeed and run if we use a different port than 10050. But we would like to use your product as intended.
OS / ENVIRONMENT / Used docker-compose files
Docker image from redhat registry. OS - RHEL Zabbix Agent2 Version -- 5.0.22 EKS v1.19
CONFIGURATION
STEPS TO REPRODUCE
Deploy Zabbix for Kubernetes as documented here https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.2. Be sure to use RHEL 5.0.22 agent2 images you can obtain from RedHat registry. Be sure to set the below values in zabbix_values.yaml
zabbixProxy.enabled = false zabbixAgent.env.ZBX_ACTIVE_ALLOW=true zabbixAgent.env.ZBX_PASSIVE_ALLOW=false
EXPECTED RESULTS
I would expect that since your Helm deployment is setting startup/liveness Probes to check on TCP port 10050, that port would be open on the pod, regardless of if ZBX_PASSIVE_ALLOW is set to true.
ACTUAL RESULTS
If ZBX_PASSIVE_ALLOW=false, health probes fail.