Closed scoopex closed 1 year ago
Hi again :)
Too bad this error did not just go away by migrating the project to another home ...
I can still not reproduce the error at all.
I have just deployed a Zabbix installation with the following values.yaml and I do not experience this kind of behavior. I have to suppose something else is wrong in your Kubernetes cluster. Maybe inter-node communication not working? Maybe network policies being broken for inter-node communication? These are issues that I had in the past causing this kind of problems.
Please, try an installation with exactly the this values.yaml. Have you tried to deploy in another Kubernetes cluster and checked whether the behavior is the same? Is there a way to get me access to such an installation? Or can we try the other way around? I give you access to a cluster and we try this way?
zabbix_image_tag: ubuntu-6.0.8
zabbixserver:
enabled: true
replicaCount: 2
postgresql:
enabled: true
image:
repository: postgres
tag: 14
persistence:
enabled: false
storage_size: 5Gi
zabbixproxy:
enabled: false
zabbixagent:
enabled: true
zabbixweb:
enabled: true
replicaCount: 2
extraEnv:
- name: ZBX_SERVER_NAME
value: Demo Zabbix
zabbixwebservice:
enabled: true
ingress:
enabled: true
hosts:
- host: zabbix.kube.demo
paths:
- path: /
pathType: ImplementationSpecific
Installation was done with the following command, just for completeness:
helm -n zabbix install zabbix -f zabbix_values.yaml zabbix-community/zabbix
@scoopex Do you have any updates on this from your side? Have you had any success with my last message? Otherwise I would close this issue.
Hi Christian,
currently using in version 6.0.7. I will test your settings next week.
I have a suspicion that the Kubernetes service tries to distribute the requests from Zabbix-Web randomly to both started Zabbix servers. Probably the error message occurs when the standby server is reached.
Actually, that shouldn't happen. The web interface fetches the address of the Zabbix server instance being "active" from the database (this is what essentially the new Zabbix server HA feature works like) and uses this one to connect to, directly. For that to work, Zabbix server instances need to "know" which pod IP address they are running on, and write this address to the database when "signing in" to the cluster. Service objects are actually not being employed for this. If you check your issue reported in the old repo of this chart, you will find hints provided by me to analyze this further.
Thanks for your hint. The problem is solved now. The problem was that we had the definition the "ZBX_SERVER_HOST" environment variable for "zabbix-web" for historical reasons. This prevented the automatic discovery and used the service instead. Removing the "ZBX_SERVER_HOST" environment variable solved the problem.
Describe the bug
It seems that the deployed nginx image ("zabbix/zabbix-web-nginx-pgsql:ubuntu-6.0.7") has problems on connecting to the zabbix server instance. This problem appeared with release ~3.0.1 of the chart and clustered zabbix servers and still exists with heal chart release 3.2.2.
A reduction of replicas to a single zabbix server lets the warning disappear.
See:![The warning message](https://user-images.githubusercontent.com/288876/182042269-9ac03cd0-7e5e-4356-b51b-f4d0e49b29dd.png)
How to reproduce it (as minimally and precisely as possible):
Open zabbix web, wait for a few minutes to get the error message.
See also https://github.com/cetic/helm-zabbix/issues/71 more more context