We are using docker-splunk in k8s and therefore use checkstate.sh as liveness probe. The problem is that checkstate.sh executes the following to check if Splunk is still running: curl --max-time 30 --fail --insecure $scheme://localhost:8089/
So it checks if splunkd is still running on port 8089, but this is probably the thing that is available until the very last second if Splunk shutting down so Splunk Web, HECs, Receivers,... they are all already gone when this endpoint will still return 200. As long as this returns 200 the LoadBalancer or something like an ingress-nginx will happily send traffic to the endpoint, leading to timeouts and broken connections.
My proposal to fix this would be to apply the following logic in checkstate.sh:
Check if there are HECs and receivers running
If so, assess the liveness of the container based on the response of these ports and not 8089
If not, stay with the current check and see if 8089 is still available
There may even be better ways to achieve this, maybe someone has an idea?
If we agree on a fix I would be happy to create an MR to solve this.
We are using docker-splunk in k8s and therefore use
checkstate.sh
as liveness probe. The problem is thatcheckstate.sh
executes the following to check if Splunk is still running:curl --max-time 30 --fail --insecure $scheme://localhost:8089/
So it checks if splunkd is still running on port 8089, but this is probably the thing that is available until the very last second if Splunk shutting down so Splunk Web, HECs, Receivers,... they are all already gone when this endpoint will still return 200. As long as this returns 200 the LoadBalancer or something like an ingress-nginx will happily send traffic to the endpoint, leading to timeouts and broken connections.
My proposal to fix this would be to apply the following logic in
checkstate.sh
:There may even be better ways to achieve this, maybe someone has an idea?
If we agree on a fix I would be happy to create an MR to solve this.