kappnav / issues

kAppNav global issues
0 stars 0 forks source link

Readiness & Liveness probe failed error during Kappnav startup #175

Closed vipinmenon closed 4 years ago

vipinmenon commented 4 years ago

While I try the following command on a minikube

curl -L https://raw.githubusercontent.com/kappnav/operator/master/releases/latest/kappnav.yaml | sed "s|kubeEnv: okd|kubeEnv: minikube|" | kubectl create -f - -n kappnav

I see two of the containers never come up

vipins-MacBook-Pro:KAppNav vipinmv$ kubectl get pods -n kappnav
NAME                                  READY   STATUS    RESTARTS   AGE
kappnav-controller-56f76df848-kbhpc   1/2     Running   1          5m26s
kappnav-operator-b45749494-rdbp7      1/1     Running   0          5m35s
kappnav-ui-85d4f4c8b7-lvf2s           1/2     Running   1          5m26s

When I look at the events inside, this is what I see

Events:
  Type     Reason     Age                   From               Message
  ----     ------     ----                  ----               -------
  Normal   Scheduled  4m47s                 default-scheduler  Successfully assigned kappnav/kappnav-controller-56f76df848-kbhpc to m01
  Normal   Pulling    4m45s                 kubelet, m01       Pulling image "kappnav/apis:0.7.0"
  Normal   Pulled     4m37s                 kubelet, m01       Successfully pulled image "kappnav/apis:0.7.0"
  Normal   Created    4m36s                 kubelet, m01       Created container kappnav-api
  Normal   Started    4m36s                 kubelet, m01       Started container kappnav-api
  Normal   Pulling    4m36s                 kubelet, m01       Pulling image "kappnav/controller:0.7.0"
  Normal   Pulled     4m29s                 kubelet, m01       Successfully pulled image "kappnav/controller:0.7.0"
  Normal   Created    4m29s                 kubelet, m01       Created container kappnav-controller
  Normal   Started    4m28s                 kubelet, m01       Started container kappnav-controller
  Warning  Unhealthy  78s (x10 over 3m34s)  kubelet, m01       Readiness probe failed: Get https://172.17.0.13:9443/kappnav/health: dial tcp 172.17.0.13:9443: connect: connection refused
  Warning  Unhealthy  74s (x6 over 2m30s)   kubelet, m01       Liveness probe failed: Get https://172.17.0.13:9443/kappnav/health: dial tcp 172.17.0.13:9443: connect: connection refused
  Normal   Killing    71s                   kubelet, m01       Container kappnav-api failed liveness probe, will be restarted

I have tried couple of restarts and I have seen it up and running sometimes. But the probability is too low. Can somebody help this running consistently.

Thanks!

vipinmenon commented 4 years ago

I don't have ocp or okd installed on my system. I have only minikube and is consistently failing there.

amylin1 commented 4 years ago

It takes time for the containers to start and the probe would fail before they are up and running. It would be a problem if the containers never start, but both kappnav-controller and kappnav-ui containers started successfully from the event messages you posted. I also see the same message in my ocp console and it just takes time to start controller containers.

vipinmenon commented 4 years ago

@amylin1 Thanks for the response. I have tried waiting for hours but hardly I saw once or twice all the containers up and running. As soon as the Liveness probe and Readiness probe messages pitch in, container restarts and finally end up with the same message followed by restart again. Is there a way to debug this further. ?

amylin1 commented 4 years ago

I saw different failed readiness probe error in my ocp environment consistently before the controller container startup. It may relate to some readiness initial delay or timeout values. I will try to adjust them to see if it can resolve the issue.

Generated from kubelet on worker0.amylin1.os.fyre.ibm.com Readiness probe failed: Get https://10.254.5.207:9443/kappnav/health: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

amylin1 commented 4 years ago

I have increased initial delay seconds and timeout seconds value and I don't see controller readiness/liveness failed error messages. I will close the issue and you can open another issue if you still see some errors.