Open dcharleston opened 3 years ago
Hi, anyone taking a look at the above issue? Also having the same problem
same issue
Okay Here's the underlying issue as i see it in my minikube environment running on my fedora linux system. The sensu-backend readinessProbe is failing in a weird way because of not quite yet supported ipv6 in minikube.
It looks like minikube is letting the sensu-backend bind its tcp api to ipv6 localhost tcp port 8080 instead of ipv4 tcp port 8080, and there doesn't seeem to be an obvious way to prevent minikube from allowing this to happen. Here's what it looks like from inside the sensu-backend-0 container running under my minikube
$ netstat -tlpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:6060 0.0.0.0:* LISTEN 1/sensu-backend
tcp 0 0 127.0.0.1:3030 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3031 0.0.0.0:* LISTEN -
tcp 0 0 :::8081 :::* LISTEN 1/sensu-backend
tcp 0 0 :::8080 :::* LISTEN 1/sensu-backend
tcp 0 0 :::3000 :::* LISTEN 1/sensu-backend
those last 3 services are listening on ipv6 and that definitely not good.
The k8s configurations provided in this repo assumes ipv4 will be using in the pods. The sensu-backend readinessProbe uses the busybox provided wget in an alpine container which is not ipv6 compatible.
Need to either figure out a way to configure minikube so it doesnt let that happen, or we need to figure out a way to tell the sensu-backend to explicitly bind onf ipv4 localhost.
Turns out this is a problem with the sensu-backend readinessProbe settings. The settings were too aggressive for default minikube resource provisioning and probes were being started faster then they were timing out, causing a problem.
Please test PR #10 and comment there on the potential fix
Did anyone got this working?
@mvthul I believe I have a fix for this, and I have an open PR for it, see previous comment. I just need someone experiencing the problem to test my proposed fix and make sure it works for them.
I changed ur changes that I could see I see everything is green and running but stil context deadline is appearing in logs. When I log in t sensu there is a red bar popping up and if I click details I see under ETCD context deadline. Tried so many things to fix and tried so many other helm charts and scripts. Nothing seems to work with version 6+
The specific changes needed to solve the problem may require system specific changes to the configuration... let me explain.
There are timeouts configured for the readiness probes and if the system running minikube is resource poor, then the those configurations will be too aggressive and the readiness probes will fall over because the underlying service didn't get enough cpu cycles to complete the start up process.
the PR i put together changes these settings enough so that it works on my laptop running minikube. But the nature of the problem is such that even though it works for me, it might fail for someone else with tighter system resources.
There might not be a one size fits all solution here, because we definitely still want the readiness probes to give up at reasonable point. For something like google or amazon's service that reasonable point of failure is much sooner than any local minikube deployment...because of available resources.
If as a minikube user your still having this specific problem, you may need to further adjust the readinessProbe settings to give your minikube deployment more time to provision everything.
I tried in Azure AKS and locally with Microk8s both same issue 😭
Van: Jef Spaleta @.> Verzonden: Thursday, August 11, 2022 6:40:31 PM Aan: sensu/sensu-k8s-quick-start @.> CC: mvthul @.>; Mention @.> Onderwerp: Re: [sensu/sensu-k8s-quick-start] etcd context deadline exceeded - sensu backend not connecting to etcd (#9)
The specific changes needed to solve the problem may require system specific changes to the configuration... let me explain.
There are timeouts configured for the readiness probes and if the system running minikube is resource poor, then the those configurations will be too aggressive and the readiness probes will fall over because the underlying service didn't get enough cpu cycles to complete the start up process.
the PR i put together changes these settings enough so that it works on my laptop running minikube. But the nature of the problem is such that even though it works for me, it might fail for someone else with tighter system resources.
There might not be a one size fits all solution here, because we definitely still want the readiness probes to give up at reasonable point. For something like google or amazon's service that reasonable point of failure is much sooner than any local minikube deployment...because of available resources.
If as a minikube user your still having this specific problem, you may need to further adjust the readinessProbe settings to give your minikube deployment more time to provision everything.
— Reply to this email directly, view it on GitHubhttps://github.com/sensu/sensu-k8s-quick-start/issues/9#issuecomment-1212226150, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AM3SRPX2VHTZMGBFECJSI3TVYUUH7ANCNFSM44XJBN3Q. You are receiving this because you were mentioned.Message ID: @.***>
okay well this inst confined to minikube.. this needs to be reinvestigated.
Azure AKS isn't a service I've tested against yet, but I'll look into it.
@mvthul Okay so for me on minikube, the context deadline exceeded error is most likely due to having slow disk access to the visualized volumes. etcd is sensitive to slow disk performance for its backing store.
For me the context timeout exceeded messages are intermittent and aren't causing a problem for the intended purpose of kicking the tires in minikube, everything spins up and I'm able to use the sensu dashboard.
For Azure AKS, you might need to change the storage class associated with the sensu-etcd persistent volume. I don't know what AKS storageClass options has out of the gate, but you'll want a dedicated SSD for the sensu-etcd volume.
I was experiencing the same issue on tanzu kubernetes, seems the PR works as expected, i think you should merge it.
This issue has been mentioned on Sensu Community. There might be relevant details there:
https://discourse.sensu.io/t/issues-installing-sensu-6-10-on-eks/3137/2
I'm following the readme and using all default settings. Running locally on minikube. sensu-backend pod repeatedly fails because the readiness check for the backend's /health endpoint never passes. It returns:
etcd cluster health comes back as healthy from both the etcd and the sensu-backend container :
Following errors appear in the sensu-backend logs: