splunk / splunk-operator

Splunk Operator for Kubernetes
Other
206 stars 114 forks source link

Review livenessProbe for pod startup or reduce the number of Splunk restarts in ansible playbook #233

Open romain-bellanger opened 3 years ago

romain-bellanger commented 3 years ago

We recently (since upgrade from Splunk 8.0 to 8.1.0.1) experienced an issue with liveness probe failing before completion of the startup ansible playbook on cluster-master pod.

The liveness probe is configured with:

      failureThreshold: 3
      initialDelaySeconds: 300
      periodSeconds: 30
      successThreshold: 1
      timeoutSeconds: 30

This gives 6 minutes to the pod to start. On some of our multisite clusters, the startup playbook triggers 4 Splunk restarts on the cluster-master pod, each taking approximately 50s, and was taking 7 minutes to complete, causing Kubernetes to reschedule the pod. We've patched the exporter to extend the initialDelaySeconds to 450 to work around this issue.

Several actions could be considered:

pogdin commented 2 years ago

CSPL-569

akondur commented 1 year ago

Hi @romain-bellanger , we have made the livenessProbe and readinessProbe configurable with the latest release. Could you please check and let us know if we can close this issue?