zabbix-community / helm-zabbix

Helm chart for Zabbix
https://artifacthub.io/packages/helm/zabbix-community/zabbix
Apache License 2.0
77 stars 45 forks source link

External Zabbix Agent 2 appearing as Unknown on Zabbix Frontend #67

Closed pavlovnicola closed 1 month ago

pavlovnicola commented 4 months ago

Hi,

I have brought Zabbix Server up in a K8s Cluster using HELM.

I have external Linux VMs (outside the K8s Cluster) that I would like to monitor. I have installed Zabbix Agent2 on one Linux VM. I have configured it as Active. I can see the agent in Zabbix Frontend (appearing as Unknown) but I see a wrong IP. I see an IP that belongs to K8s.

Zabbix Server runs as NodePort (30051). I have setup an external Load Balancer on the same NodePort (30051). Agent connects to the LB VIP on port 30051.

Any hint?

aeciopires commented 4 months ago

Hi @pavlovnicola!

I need more information to help you. Please, could you sent print screen of Zabbix Frontend (configuration screens of the VM) and log files of Zabbix Agent installed in the VM and the Zabbix Server in k8s?

For security reasons you can hide the first three octets of the IPs. Example: x.x.x.21

pavlovnicola commented 4 months ago

Hi @aeciopires

Thank you for getting back to me.

Zabbix Server log from K8s:

328:20240213:072452.141 cannot send list of active checks to "10.250.68.192": host [klocwork-lic] not found

10.250.68.192 is a K8s IP. It does not belong to any Service/Pod/Endpoint.

Zabbix Agent logs:

2024/02/13 09:24:50.564045 Calling C function "tls_new_context()"
2024/02/13 09:24:50.564343 Calling C function "tls_new_context()"
2024/02/13 09:24:50.564496 Calling C function "tls_version()"
2024/02/13 09:24:50.564511 OpenSSL library (OpenSSL 1.1.1  11 Sep 2018) initialized
2024/02/13 09:24:50.564518 Calling C function "tls_describe_ciphersuites()"
2024/02/13 09:24:50.564546 Calling C function "free()"
2024/02/13 09:24:50.564561 default context ciphersuites: TLS_CHACHA20_POLY1305_SHA256 TLS_AES_128_GCM_SHA256 PSK-AES128-CBC-SHA
2024/02/13 09:24:50.564568 Calling C function "tls_describe_ciphersuites()"
2024/02/13 09:24:50.564575 Calling C function "free()"
2024/02/13 09:24:50.564665 psk context ciphersuites: TLS_CHACHA20_POLY1305_SHA256 TLS_AES_128_GCM_SHA256 PSK-AES128-CBC-SHA256 PSK-AES128-CBC-SHA
2024/02/13 09:24:50.564708 using configuration file: /etc/zabbix/zabbix_agent2.conf
2024/02/13 09:24:50.564783 using plugin 'Agent' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.564792 using plugin 'Ceph' (built-in) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.564800 using plugin 'Cpu' (built-in) providing following interfaces: exporter, collector, runner
2024/02/13 09:24:50.564811 using plugin 'DNS' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.564818 using plugin 'Docker' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.564825 using plugin 'File' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.564881 using plugin 'Hw' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.564896 using plugin 'Kernel' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.564906 using plugin 'Log' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.564916 using plugin 'MQTT' (built-in) providing following interfaces: watcher, configurator
2024/02/13 09:24:50.564927 using plugin 'Memcached' (built-in) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.564938 using plugin 'Memory' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.564945 using plugin 'Modbus' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.564957 using plugin 'MongoDB' (/usr/sbin/zabbix-agent2-plugin/zabbix-agent2-plugin-mongodb) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.564974 using plugin 'Mysql' (built-in) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.564983 using plugin 'NetIf' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.564991 using plugin 'Oracle' (built-in) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.565001 using plugin 'PostgreSQL' (/usr/sbin/zabbix-agent2-plugin/zabbix-agent2-plugin-postgresql) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.565023 using plugin 'Proc' (built-in) providing following interfaces: exporter, collector
2024/02/13 09:24:50.565031 using plugin 'ProcExporter' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565037 using plugin 'Redis' (built-in) providing following interfaces: exporter, runner, configurator
2024/02/13 09:24:50.565045 using plugin 'Smart' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565055 using plugin 'Sw' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565062 using plugin 'Swap' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565068 using plugin 'SystemRun' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565074 using plugin 'Systemd' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565081 using plugin 'TCP' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565089 using plugin 'UDP' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565099 using plugin 'Uname' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565105 using plugin 'Uptime' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565113 using plugin 'Users' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565122 using plugin 'VFSDev' (built-in) providing following interfaces: exporter, collector
2024/02/13 09:24:50.565129 using plugin 'VFSDir' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565147 using plugin 'VfsFs' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565153 using plugin 'WebCertificate' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565160 using plugin 'WebPage' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565166 using plugin 'ZabbixAsync' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565180 using plugin 'ZabbixStats' (built-in) providing following interfaces: exporter, configurator
2024/02/13 09:24:50.565187 lowering the plugin ZabbixSync capacity to 1 as the configured capacity 100 exceeds limits
2024/02/13 09:24:50.565191 using plugin 'ZabbixSync' (built-in) providing following interfaces: exporter
2024/02/13 09:24:50.565249 [Modbus] Config is valid
2024/02/13 09:24:50.565299 Plugin communication protocol version is 6.0.13
2024/02/13 09:24:50.565324 starting manager
2024/02/13 09:24:51.000166 [0] processing update request (1 requests)
2024/02/13 09:24:51.000201 [0] registering new client
2024/02/13 09:24:51.000209 Calling C function "new_global_regexp()"
2024/02/13 09:24:51.000247 [0] adding new request for key: 'system.hostname'
2024/02/13 09:24:51.000269 [0] created direct exporter task for plugin 'Uname' itemid:0 key 'system.hostname'
2024/02/13 09:24:51.000308 executing direct exporter task for key 'system.hostname'
2024/02/13 09:24:51.000320 executed direct exporter task for key 'system.hostname'
2024/02/13 09:24:51.000337 Zabbix Agent2 hostname: [klocwork-lic]
2024/02/13 09:24:51.000422 [101] starting memory cache
2024/02/13 09:24:51.000568 [101] starting server connector for [x.x.x.93:30051]
2024/02/13 09:24:51.000673 [0] starting listener for '0.0.0.0:10050'
2024/02/13 09:24:51.000983 listening for control connections on /run/zabbix/agent.sock
2024/02/13 09:24:52.001418 [101] In refreshActiveChecks() from [x.x.x.93:30051]
2024/02/13 09:24:52.001474 [0] processing update request (1 requests)
2024/02/13 09:24:52.001486 [0] adding new request for key: 'system.uname'
2024/02/13 09:24:52.001494 [0] created direct exporter task for plugin 'Uname' itemid:0 key 'system.uname'
2024/02/13 09:24:52.001559 executing direct exporter task for key 'system.uname'
2024/02/13 09:24:52.001569 executed direct exporter task for key 'system.uname'
2024/02/13 09:24:52.001652 connecting to [x.x.x.93:30051] [timeout:3s, connection timeout:3s]
2024/02/13 09:24:52.002163 sending [{"request":"active checks","host":"klocwork-lic","version":"6.0","host_metadata":"Linux klocwork-lic 5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020 x86_64"}] to [x.x.x.93:30051]
2024/02/13 09:24:52.002740 receiving data from [x.x.x.93:30051]
2024/02/13 09:24:52.142834 received [{"response":"failed","info":"host [klocwork-lic] not found"}] from [x.x.x.93:30051]
2024/02/13 09:24:52.143048 [101] no active checks on server [x.x.x.93:30051]: host [klocwork-lic] not found
2024/02/13 09:24:52.143091 [101] End of refreshActiveChecks() from [x.x.x.93:30051]
2024/02/13 09:24:52.143102 [101] processing update request (0 requests)
2024/02/13 09:24:52.143106 [101] skipping empty update for unregistered client

image

Zabbix Agent Config:

LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=0
DebugLevel=5
Server=0.0.0.0/0
ServerActive=x.x.x.93:30051
HostMetadataItem=system.uname
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=./zabbix_agent2.d/plugins.d/*.conf
aeciopires commented 4 months ago

Hi @pavlovnicola!

I haven't yet taken the time to reproduce your context in my test environment, but just looking at the information you shared, it seems to me that there is some encapsulation using NAT over the source IP of the VM that has the Zabbix Agent installed to an IP from the Kubernetes cluster network to be able to talk to the Zabbix Server. And then, when there is a way back, Zabbix Server cannot find the VM on the appropriate host.

It's good to use wireshark or kubeshard to do a packet analysis and also see if any CNI (Container Network Interface) configuration can be adjusted: CNI examples: Calico, WeaveScope, etc.

Again, I may be talking nonsense...

Anyway, I also did a little research and found some reports of this same problem, even when Kubernetes is not used... How about checking out the tips?

pavlovnicola commented 4 months ago

Hi @aeciopires

I have been trying few things. I have cloned an existing template (Linux Zabbix agent) and I have changed its type to Active. The status of the agent initially is "Unknown". After that it changes to RED. The reason is because the Server is trying to reach it on the wrong IP Address.

I have read that "Active agents" (behind firewall) should remain "Unknown". I can live with that. I do not want the Server to try to reach the agent. My question now is: how to tell the Server not to reach the agent but wait for requests from the agent only.

aeciopires commented 4 months ago

Hi @pavlovnicola!

I understand your point about of Zabbix Server try communicate with device behind firewall in active mode.

Perhaps is more efective ask about this in https://www.zabbix.com/forum/ or https://support.zabbix.com/secure/Dashboard.jspa

Other way is see the Zabbix Roadmap https://www.zabbix.com/roadmap and Zabbix Releases notes https://www.zabbix.com/release_notes , and Zabbix Configurations files

https://www.zabbix.com/documentation/current/en/manual/appendix/config/zabbix_server https://www.zabbix.com/documentation/current/en/manual/appendix/config/zabbix_proxy https://www.zabbix.com/documentation/current/en/manual/appendix/config/zabbix_agent2 https://www.zabbix.com/documentation/current/en/manual/appendix/config/zabbix_agentd https://www.zabbix.com/documentation/current/en/manual/appendix/config/zabbix_java https://www.zabbix.com/documentation/current/en/manual/appendix/config/zabbix_web_service

Anyway I think that can a good idea monitoring external devices out Kubernetes using a Zabbix Proxy installed out of Kubernetes sending data about devices monitored to Zabbix Server in Kubernetes.

This way, Zabbix Server only needs to try the path back to a single IP, which is easier to release on the firewall and live with this type of configuration than having to release N rules on the firewall for N devices on N networks.

Furthermore, the Zabbix Proxy can be used to cache and retain data for a certain period of time, if communication with the Zabbix Server drops, whether because the Zabbix Server pod is being recreated, connectivity problems, lack of resources or another reason.

ViperousTiger commented 2 months ago

@pavlovnicola Not sure if it helps but theres known issues with zabbix where if you are using active monitoring the zabbix frontend will show unknown but if you switch it to passive itll show available. This was a pretty big consensus in the zabbix reddit - "The "online indicator" is just showing if the agent is reachable in passive mode if configured so. It's a bit misleading since it never shows anything when only using active checks. Or worse: showing offline when the agent was changed from passive to active only. You should only rely on checks via items and their triggers."

aeciopires commented 1 month ago

I'm closing this issue, but you can reopen it if you need to make any adjustments to the code. If you have any questions, you can use the Discussions feature https://github.com/zabbix-community/helm-zabbix/discussions