openanalytics / shinyproxy-operator

Easily run ShinyProxy on a Kubernetes cluster
https://shinyproxy.io
Apache License 2.0
36 stars 9 forks source link

Liveness and Readiness probe errors for ShinyProxy deployment #2

Closed cnukwas closed 3 years ago

cnukwas commented 3 years ago

Deployed ShinyProxy Operator as is and noticed that ShinyProxy pod going into CrashLoopBackOff status due to liveness and readiness probe errors. I don't see mention of these probes anywhere either in the crd.yaml or shinyproxy.yaml, so not sure where these are configured. Is there a way to configure or disable these probes?

Note that I did not run any YAML files related Skipper-Ingress since we want to use our own Ingress Controller. I don't think it's related to that but making a note of that.

Warning  Unhealthy  36s (x6 over 44s)  kubelet, mynode04  Liveness probe failed: Get http://192.168.222.152:8080/actuator/health/liveness: dial tcp 192.168.222.152:8080: connect: connection refused
Warning  Unhealthy  36s (x5 over 43s)  kubelet, mynode04  Readiness probe failed: Get http://192.168.222.152:8080/actuator/health/readiness: dial tcp 192.168.222.152:8080: connect: connection refused
Normal   Killing    36s (x2 over 42s)  kubelet, mynode04  Container shinyproxy failed liveness probe, will be restarted
cnukwas commented 3 years ago

Tried to replace or change but none of the attempts have worked, so added below YAML under kubernetesPodTemplateSpecPatches section, to temporarily disable the probes, until a root cause is found.

- op: remove
   path: /spec/containers/0/readinessProbe
- op: remove
   path: /spec/containers/0/livenessProbe
LEDfan commented 3 years ago

In our configurations, the liveness and readiness probes are working correctly, but only if you are using a Kubernetes version which supports StartupProbes, otherwise you will see similar behavior as you pointed out.

For deployments running older Kubernetes version (e.g. 1.17) we use the following configuration with success:

  kubernetesPodTemplateSpecPatches: |
    - op: replace
      path: /spec/containers/0/livenessProbe
      value:
        failureThreshold: 2
        httpGet:
          path: /actuator/health/liveness
          port: 8080
          scheme: HTTP
        periodSeconds: 1
        initialDelaySeconds: 140
        successThreshold: 1
        timeoutSeconds: 1
    - op: replace
      path: /spec/containers/0/readinessProbe
      value:
        failureThreshold: 2
        httpGet:
          path: /actuator/health/readiness
          port: 8080
          scheme: HTTP
        periodSeconds: 1
        initialDelaySeconds: 140
        successThreshold: 1
        timeoutSeconds: 1
LEDfan commented 3 years ago

Improving the configuration of readiness and liveness probes is on our TODO list.

cnukwas commented 3 years ago

@LEDfan , thank you very much for the update. We're on older Kubernetes version*1.1.4.x) due to other dependencies. I would leave the probes disabled for now as we will be upgrading to either 1.18 or 1.19 sometime soon. Please add a note about the version dependency if possible.

LEDfan commented 3 years ago

Hi @cnukwas in the latest version of the operator, we now have added two options that can tune the readiness and liveness probes so that you can more easily use it on older kube versions.

I also added some notes about version requirements to the README.

cnukwas commented 3 years ago

@LEDfan , thank you very much for the update. Will get the latest version as we're starting to get back to this work.

Hi @cnukwas in the latest version of the operator, we now have added two options that can tune the readiness and liveness probes so that you can more easily use it on older kube versions.

I also added some notes about version requirements to the README.