Open kapishreshth opened 1 month ago
This tls handshake error keeps coming. Not sure what that ip:port is?
You'd have to determine which pod that IP belongs to assuming it is a client on the pod network.
Prometheus operator gets regularly accessed by two client groups only: Prometheus when scraping its metrics endpoint and kube-api-server when communicating with the webhook.
If you enable TLS in Prometheus operator, its service monitor gets adjusted for TLS so that Prometheus scrapes over TLS with https client. As to the webhook, kube-api-server refuses not to communicate over TLS, so that it always is a https client.
See whether you can find that client's IP address amongst pods' IP addresses, e.g. with a command like this:
kubectl get pod \
-o custom-columns='NAMESPACE:.metadata.namespace,NAME:.metadata.name,IP:status.podIP'
Depending on your permissions, you can apply it on your monitoring namespace or cluster wide (-A
). I reckon that client runs outside of the monitoring stack.
On my cluster it seems to be cilium's envoy proxy, as far as I can tell:
operator logs:
level=error caller=/opt/hostedtoolcache/go/1.23.1/x64/src/net/http/server.go:3487 msg="http: TLS handshake error from 10.10.1.63:55470: remote error: tls: bad certificate"
kubectl -n kube-system exec ds/cilium -- cilium status --all-controllers --all-health --all-redirects
...
Proxy Status: OK, ip 10.10.1.63, 0 redirects active on ports 10000-20000, Envoy: external
...
But there's also:
level=warn caller=/home/runner/work/prometheus-operator/prometheus-operator/pkg/server/server.go:164 msg="server TLS client verification disabled" client_ca_file=/etc/tls/private/tls-ca.crt err="stat /etc/tls/private/tls-ca.crt: no such file or directory"
Is it possible that the helm chart doesn't configure the admissions webhook correctly, to use the cluster ca?
The prometheus-operator has a detailed guide: https://prometheus-operator.dev/docs/platform/webhook/
but I don't see any Certificate
CRD being created by the chart.
Perhaps cilium-envoy tries to contact the admissions webhook and fails?
Describe the bug a clear and concise description of what the bug is.
I did a fresh checkout of "kube-prometheus-stack" helm chart and setup on AWS
EKS
cluster. All pods are running fine. I set agent mode asagentMode: true
invalues.yaml
file.It can scrape pods metrics to Grafana. Everything works as expected except one error I observed in operator pod logs as following. This tls handshake error keeps coming. Not sure what that
ip:port
is?Another tls error was also there before this tls error. So, to fix that one I added below change in values.yaml file under the
kubEtcd
ServiceMonitor component and worked.serviceMonitor: tlsConfig: insecureSkipVerify: true
However, this tls error stated in the screenshot above is still clueless. It would be immense help if someone could provide any input. Thank you!
Do let me know if the information is not sufficient. Please excuse me for the format, posting for the first time.
What's your helm version?
version.BuildInfo{Version:"v3.15.2", GitCommit:"1a500d5625419a524fdae4b33de351cc4f58ec35", GitTreeState:"clean", GoVersion:"go1.22.4"}
What's your kubectl version?
Client Version: v1.29.2 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.16-eks-a737599
Which chart?
kube-prometheus-stack in agent mode
What's the chart version?
63.1.0
What happened?
I did a fresh checkout of "kube-prometheus-stack" helm chart and setup on AWS
EKS
cluster. All pods are running fine. I set agent mode asagentMode: true
invalues.yaml
file.It can scrape pods metrics to Grafana. Everything works as expected except one error I observed in operator pod logs as following. This tls handshake error keeps coming. Not sure what that
ip:port
is?Another tls error was also there before this tls error. So, to fix that one I added below change in values.yaml file under the
kubEtcd
ServiceMonitor component and worked.serviceMonitor: tldConfig: insecureSkipVerify: true
However, this tls error stated in the screenshot above is still clueless. It would be immense help if someone could provide any input. Thank you!
Do let me know if information is not sufficient. Please excuse me for the format, posting for the first time.
What you expected to happen?
No response
How to reproduce it?
No response
Enter the changed values of values.yaml?
No response
Enter the command that you execute and failing/misfunctioning.
helm install kube-prometheus-stack
Anything else we need to know?
No response