zabbix / zabbix-docker

Official Zabbix Dockerfiles
https://www.zabbix.com
GNU Affero General Public License v3.0
2.39k stars 1.38k forks source link

Zabbix Agent2 not listening on port 10050 unless PassiveMode is True #952

Closed wbrown64 closed 2 years ago

wbrown64 commented 2 years ago
SUMMARY

I am deploying Zabbix agent2 5.0 through Kubernetes as documented here https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.2. We are not using a proxy server at the moment, rather for this POC having the agent send data directly to our Zabbix server. The agent will spin up and run, that is until the liveness and startup probes are enacted, when they run a TCP check on port 10050, which fails and restarts the pod, see below error.

Startup probe failed: dial tcp x.x.x.x:10050 connect: connection refused 

After doing some tweaks, we noticed that when setting passive mode to true, the startup and liveliness probe succeeds. We are guessing because then the container is opening up / actively listening on 10050. Note that this port is defined in the Kubernetes Daemonset definition, and as far as we can configure should be open. (See below .yaml definition).

It should be noted that the agent does connect to our Zabbix server (5.0.22), and sends data before getting killed by k8s startup probe. It also will succeed and run if we use a different port than 10050. But we would like to use your product as intended.

OS / ENVIRONMENT / Used docker-compose files

Docker image from redhat registry. OS - RHEL Zabbix Agent2 Version -- 5.0.22 EKS v1.19

CONFIGURATION
spec:
      automountServiceAccountToken: false
      containers:
      - env:
        - name: ZBX_HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        - name: ZBX_SERVER_HOST
          value:  xxx
        - name: ZBX_SERVER_PORT
          value: "10051"
        - name: ZBX_PASSIVE_ALLOW
          value: "false"
        - name: ZBX_PASSIVESERVERS
        - name: ZBX_ACTIVE_ALLOW
          value: "true"
        - name: ZBX_TIMEOUT
          value: "4"
        image: 
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 3
          periodSeconds: 10
          successThreshold: 1
          tcpSocket:
            port: 10050
          timeoutSeconds: 3
        name: zabbix-agent
        ports:
        - containerPort: 10050
          hostPort: 10050
          name: zabbix-agent
          protocol: TCP
        resources: {}
        startupProbe:
          failureThreshold: 5
          initialDelaySeconds: 10
          periodSeconds: 5
          successThreshold: 1
          tcpSocket:
            port: 10050
          timeoutSeconds: 3
STEPS TO REPRODUCE

Deploy Zabbix for Kubernetes as documented here https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.2. Be sure to use RHEL 5.0.22 agent2 images you can obtain from RedHat registry. Be sure to set the below values in zabbix_values.yaml

zabbixProxy.enabled = false zabbixAgent.env.ZBX_ACTIVE_ALLOW=true zabbixAgent.env.ZBX_PASSIVE_ALLOW=false

EXPECTED RESULTS

I would expect that since your Helm deployment is setting startup/liveness Probes to check on TCP port 10050, that port would be open on the pod, regardless of if ZBX_PASSIVE_ALLOW is set to true.

ACTUAL RESULTS

If ZBX_PASSIVE_ALLOW=false, health probes fail.

Startup probe failed: dial tcp x.x.x.x:10050: connect: connection refused
wbrown64 commented 2 years ago

I can also confirm I am seeing this using Agent(1) RHEL 5.0 image. and deploying through the kubernetes.yaml file found in the 5.0 branch of this repository.

dotneft commented 2 years ago

Could you show us pod's logs?

wbrown64 commented 2 years ago

Pod logs don't show much:

| 9205:20220421:221148.129 End of send_buffer():SUCCEED │ │ 9205:20220421:221148.129 zbx_setproctitle() title:'active checks #1 [idle 1 sec]' │ │ 9022:20220421:221148.669 Got signal [signal:15(SIGTERM),sender_pid:8972,sender_uid:1997,reason:0]. Exiting ... │ │ 9022:20220421:221148.669 zbx_on_exit() called │ │ 9022:20220421:221148.670 In zbx_dshm_destroy() shmid:-1 │ │ 9022:20220421:221148.670 End of zbx_dshm_destroy():SUCCEED │ │ 9022:20220421:221148.670 In zbx_unload_modules() │ │ 9022:20220421:221148.670 End of zbx_unload_modules() │ │ 9022:20220421:221148.670 Zabbix Agent stopped. Zabbix 5.0.22 (revision 90ee9e3).

This is mainly because it is a kubernetes termination, rather than an application error/failure. My above error in OP came out of describing the pod(s).

dotneft commented 2 years ago

please try to catch full log :-)

wbrown64 commented 2 years ago

│ [WARN tini (21035)] Tini is not running as PID 1 and isn't registered as a child subreaper. │ │ Zombie processes will not be re-parented to Tini, so zombie reaping won't work. │ │ To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1. │ │ Preparing Zabbix agent │ │ Preparing Zabbix agent configuration file │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "PidFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogType": 'console'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogFileSize": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "DebugLevel": '3'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "SourceIP": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LogRemoteCommands": '1'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Server": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ListenPort": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ListenIP": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ListenBacklog": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "StartAgents": '0'...added │ │ Using 'x.x.x.x:10051' servers for active checks │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "ServerActive": 'x.x.x.x:10051'...updated │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostInterface": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostInterfaceItem": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Hostname": 'ip-x.x.x.x.region-x.compute.internal'...updated │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostnameItem": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostMetadata": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "HostMetadataItem": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "RefreshActiveChecks": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "BufferSend": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "BufferSize": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "MaxLinesPerSecond": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Timeout": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "Include": '/etc/zabbix/zabbix_agentd.d/'...added first occurrence │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "UnsafeUserParameters": '0'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "LoadModulePath": '/var/lib/zabbix/modules/'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSConnect": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSAccept": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCAFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCRLFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSServerCertIssuer": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSServerCertSubject": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCertFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherAll": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherAll13": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherCert": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherCert13": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherPSK": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSCipherPSK13": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSKeyFile": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSPSKIdentity": ''...removed │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "TLSPSKFile": ''...removed │ │ * Updating '/etc/zabbix/zabbix_agentd.conf' parameter "DenyKey": 'system.run[]'...added │ │ Updating '/etc/zabbix/zabbix_agentd.conf' parameter "User": 'zabbix'...added │ │ Starting Zabbix Agent [ip-x.x.x.x.region-x.compute.internal]. Zabbix 5.0.22 (revision 90ee9e3). │ │ Press Ctrl+C to exit. │ │ 21082:20220421:221848.835 Starting Zabbix Agent [ip-x.x.x.x.region-x]. Zabbix 5.0.22 (revision 90ee9e3). │ │ 21082:20220421:221848.835 ** Enabled features │ │ 21082:20220421:221848.835 IPv6 support: YES │ │ 21082:20220421:221848.835 TLS support: YES │ │ 21082:20220421:221848.835 ** │ │ 21082:20220421:221848.835 using configuration file: /etc/zabbix/zabbix_agentd.conf │ │ 21082:20220421:221848.835 agent #0 started [main process] │ │ 21209:20220421:221848.924 agent #1 started [collector] │ │ 21210:20220421:221848.924 agent #2 started [active checks #1] │ │ 21082:20220421:221857.538 Got signal [signal:15(SIGTERM),sender_pid:21035,sender_uid:1997,reason:0]. Exiting ... │ │ 21082:20220421:221857.538 Zabbix Agent stopped. Zabbix 5.0.22 (revision 90ee9e3).

dotneft commented 2 years ago

StartAgents = 0? Please try to increase to at least 1.

wbrown64 commented 2 years ago

why? We do not wish to use passive checks.

dotneft commented 2 years ago

otherwise Zabbix agent will not listen 10050 port. If you do not want it, please, try to use different way how to check Zabbix agent is running or not. For example stats port 31999.

wbrown64 commented 2 years ago

Will check this out later and update.. Thank you!

wbrown64 commented 2 years ago

Still getting the same connection refused on port 31999.

At the very least isn't this a bug in HELM deployment? The health probes that are defaulting to port 10050 are relying on the assumption that passive mode is enabled.

dotneft commented 2 years ago

Please pass "ZBX_ENABLESTATUSPORT=true" to enable the port.

wbrown64 commented 2 years ago

Looks like that is not working. I am passing this as env variable

Environment: │ │ ZBX_ENABLESTATUSPORT: true

dotneft commented 2 years ago

For me it works:

** Preparing Zabbix agent
** Preparing Zabbix agent configuration file
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "PidFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogType": 'console'...added
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogFileSize": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "DebugLevel": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "SourceIP": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "LogRemoteCommands": ''...removed
** Using 'zabbix-server' servers for passive checks
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Server": 'zabbix-server'...updated
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "ListenPort": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "ListenIP": ''...removed
** Using 'zabbix-server:10051' servers for active checks
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "ServerActive": 'zabbix-server:10051'...updated
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "EnablePersistentBuffer": '0'...added
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "StatusPort": '31999'...added
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostInterface": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostInterfaceItem": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Hostname": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostnameItem": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostMetadata": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "HostMetadataItem": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "RefreshActiveChecks": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "BufferSend": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "BufferSize": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "MaxLinesPerSecond": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Timeout": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Include": '/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf'...updated
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "Include": '/etc/zabbix/zabbix_agentd.d/'...added first occurrence
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "UnsafeUserParameters": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSConnect": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSAccept": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSCAFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSCRLFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSServerCertIssuer": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSServerCertSubject": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSCertFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSKeyFile": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSPSKIdentity": ''...removed
** Updating '/etc/zabbix/zabbix_agent2.conf' parameter "TLSPSKFile": ''...removed
2022/04/22 20:33:36.762571 Starting Zabbix Agent 2 (6.0.3)

env is:

            "Env": [
                "ZBX_ENABLESTATUSPORT=true",
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "TERM=xterm",
                "ZBX_VERSION=6.0.3",
                "ZBX_SOURCES=https://git.zabbix.com/scm/zbx/zabbix.git"
            ],
wbrown64 commented 2 years ago

Should that variable work on agent1?

dotneft commented 2 years ago

No. I thought we are talking about agent2 :-)

wbrown64 commented 2 years ago

We were, but I had been playing around with agent1 in meantime. All good info though. Do you have an alternative solution for Agent1? Thanks for the help so far.

dotneft commented 2 years ago

Currently, I think, only 10050 port or PID file (but this one is not good way)...

wbrown64 commented 2 years ago

Hmm. So I can confirm that setting the ZBX_ENABLESTATUSPORT=true , along with port 31999 is working on agent2. However I would still ask if you can help me find a solution for agent1, as the current deployment spec listed here https://github.com/zabbix/zabbix-docker/blob/5.0/kubernetes.yaml for agent daemonset will not work if passive mode is disabled. Which should be an accepted use case.

dotneft commented 2 years ago

Enable passive mode with some dummy IP address of Zabbix server :-)

wbrown64 commented 2 years ago

Hate to be difficult, but with that config I am still getting connection refused on agent1, 5.0

   containers:
      - env:
        - name: ZBX_DEBUGLEVEL
          value: "3"
        - name: ZBX_DENYKEY
          value: system.run[*]
        - name: ZBX_ALLOWKEY
        - name: ZBX_LOGREMOTECOMMANDS
          value: "1"
        - name: ZBX_SERVER_HOST
          value: x.x.x.x
        - name: ZBX_PASSIVE_ALLOW
          value: "true"
        - name: ZBX_PASSIVESERVERS
          value: 0.0.0.0/0
        - name: ZBX_ACTIVE_ALLOW
          value: "true"
        - name: ZBX_STARTAGENTS
          value: "0"
        - name: ZBX_HOSTNAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: metadata.name
        image: xxx-zabbix/zabbix-agent-1:5.0
        imagePullPolicy: Always
        livenessProbe:
          failureThreshold: 3
          initialDelaySeconds: 5
          periodSeconds: 2
          successThreshold: 1
          tcpSocket:
            port: 10050
          timeoutSeconds: 1
        name: zabbix-agent
        ports:
        - containerPort: 10050
          hostPort: 10050
          name: zabbix-agent
          protocol: TCP

error:

Liveness probe failed: dial tcp x.x.x.x:10050: connect: connection refused
dotneft commented 2 years ago

ZBX_STARTAGENTS = 1

wbrown64 commented 2 years ago

Ok, this will work for us. Thank you very much @dotneft, appreciate your help on this!