Open LovingFox opened 3 months ago
Incorrect usage ZBX_TLSPSKFILE. Please check documentation: https://github.com/zabbix/zabbix-docker/tree/7.0/Dockerfiles/agent2#varlibzabbixenc
Incorrect usage ZBX_TLSPSKFILE. Please check documentation: https://github.com/zabbix/zabbix-docker/tree/7.0/Dockerfiles/agent2#varlibzabbixenc
As you can see in my config (see below) that I mount the right one file as ZBX_TLSPSKFILE. A TLS-connection is established correctly that means doesn't matter where ZBX_TLSPSKFILE is located. It's wrong with something another...
-v /etc/zabbix/agent.pass:/etc/zabbix/agent.pass \
...
--env ZBX_TLSPSKFILE=/etc/zabbix/agent.pass \
Does not matter, no reason to check something, while it is not properly configured. Specify only file name in the variable and mount the file to /var/lib/zabbix/enc. Then check again and provide full logs since Zabbix agent start with enabled debug mode = 4.
Also, I recommend you check the problem without TLS connection options. Do you have direct connection between server and agent? Without any additional services (NAT, load balancers and etc)?
Also, I recommend you check the problem without TLS connection options.
Yes, I did some tests and if I switch off TLS all working good, no errors.
Do you have direct connection between server and agent? Without any additional services (NAT, load balancers and etc)?
The host with agent is without any NAT/balancers. The host working as zabbix-server is hosted in AWS VM so that means the 1:1 NAT exists at AWS cloud network side.
Does not matter, no reason to check something, while it is not properly configured. Specify only file name in the variable and mount the file to /var/lib/zabbix/enc. Then check again and provide full logs since Zabbix agent start with enabled debug mode = 4.
Done. The log file is attached: zabbix-agent.log It's seen that errors logged immediately.
agent run config:
sudo docker run --name zabbix-agent \
--restart unless-stopped \
--net=host --pid=host \
--privileged --restart=always --init \
-v /etc/localtime:/etc/localtime:ro \
-v /var/run/docker.sock:/var/run/docker.sock \
-v /etc/zabbix/enc:/var/lib/zabbix/enc \
--env ZBX_TLSCONNECT=psk \
--env ZBX_TLSACCEPT=psk \
--env ZBX_TLSPSKIDENTITY=my_host \
--env ZBX_TLSPSKFILE=agent.pass \
--env ZBX_SERVER_HOST=1.2.3.4 \
--env ZBX_ACTIVE_ALLOW=false \
--env ZBX_DEBUGLEVEL=4 \
-d zabbix/zabbix-agent2:ubuntu-7.0.0
Thank you for the explanation. Currently I see the following steps to check the problem:
As a data point, the same problem (many failed to process an incoming connection
entries in the zabbix agent log) is happening in non-docker setups too:
https://support.zabbix.com/browse/ZBX-22351
Saying that as it's happening with my systems (not in docker) currently. Both on host servers (physical hardware) and VMs running on those servers. :frowning:
Checking one of the host servers presently, the zabbix agent logs are 99.9% these messages:
# wc -l zabbix_agent2.log
540778 zabbix_agent2.log
# grep 'failed to process an incoming connection from' zabbix_agent2.log | wc -l
529908
So of the 540k lines in the log file so far today, 530k are this message being repeated over and over.
do you monitor "net.tcp.service[tcp,,10050]"?
No, I have no idea what that is. If that's some custom setting, then there's definitely no chance it was present.
In the meantime, I've rebuilt the entire setup to use Zabbix 6.0 LTS instead which isn't exhibiting the problem.
I'm still having the same issue:
docker.io/zabbix/zabbix-agent2:alpine-6.4.13
Also other people using zabbix container (monitoring k8s cluster) have similar issue. Any idea ?
Any idea ?
Is downgrading to Zabbix 6.0 feasible?
@justinclift Unfortunately, I am unable to downgrade to version 6.0.
what image was working fine?
I tried many of agents (7.0.X, 6.0.X) on my k8s cluster and no results.. every time i get failed to process an incoming connection from...
I have 7.0.2 server at debian 12.
Interesting. Sounds like the problem is with Zabbix Server v7, and it might not be Zabbix Agent 2 causing issues.
at this moment im using active server, passive doesn’t work.
@justinclift The problem also occurs with version 6.4.13.
@rzemykers I'm using an active proxy.
@matzmz Just for clarity, that's version 6.4.13 of the Zabbix server yeah?
@justinclift, in my setup, both the Zabbix proxy and agent2 are running as Docker containers, and both are at version 6.4.13. The Zabbix server, which is running on a virtual machine, is also at version 6.4.13. The errors I'm encountering are related to the Zabbix agent2 side.
I'm also getting this error in agents deployed on Kubernetes cluster (Kind) with Helm. It's not difficult to reproduce, just run it...
I'm also facing the same issue with agents and proxies in a k8s cluster. Does anyone have any solutions or workarounds?
Same here. 6 host vSphere + vSAN cluster, 13 agent 2 (linux/windows), 33 SNMP. Zabbix server 7.0.3 + MySQL in docker on the separate VM.
Log from the Debian 12 vm with agent 2 (without docker):
It seems that this problem occurs around the line plugin Cpu: executing collector task
Connection between server and agent constantly become failed. Zabbix-Server 7.0.0 is setup without docker, zabbix-agent 2 in the remote host works in docker.
There are not any errors if I use zabbix-agent 1 in docker.
Logs 1.2.3.4 is a server IP my_host is a name of the agent at the remote host
Agent logs:
Server logs the same time:
Versions
Server:
Agent:
Configs:
Server:
Agent: