I thought that if the sentinel pod receives a SIGKILL, it would make it self "not ready" so that loadbalancer service stops sending requests.
Actual behaviour
My pod got stopped because of node scaling down (so pod got moved to a different node), and a request still went to that sentinel pod, which ended up in an error that "redis went away"
Environment
How are the pieces configured?
Redis Operator version: v1.2.1
Kubernetes version: v1.25.15-gke.1115000
Kubernetes configuration used: not sure what to answer here sorry
Logs
Container logs at the time of the event:
INFO 2024-02-02T10:34:45.033200246Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:34:45.033 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:34:45.038454080Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:34:45.038 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:15.528001493Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.527 # +set master mymaster 10.2.85.14 6379 down-after-milliseconds 1500
INFO 2024-02-02T10:35:15.532660946Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.532 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:15.533390024Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.532 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:35:15.538118999Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.537 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:45.424395751Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.424 # +set master mymaster 10.2.85.14 6379 down-after-milliseconds 1500
INFO 2024-02-02T10:35:45.430492822Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.430 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:45.430856163Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.430 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:35:45.435810640Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.435 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:36:15.728698109Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.727 # +set master mymaster 10.2.85.14 6379 down-after-milliseconds 1500
INFO 2024-02-02T10:36:15.733848405Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.733 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:36:15.734494313Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.734 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:36:15.740620714Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.740 * Sentinel new configuration saved on disk
Nothing special I see there.
Pod logs at the time of the event:
INFO 2024-02-02T10:36:38Z [resource.labels.podName: rfs-conteo-prod-redis-7fc8844655-2qbcw] deleting pod for node scale down
INFO 2024-02-02T10:36:39Z [resource.labels.podName: rfs-conteo-prod-redis-7fc8844655-2qbcw] Stopping container sentinel
WARNING 2024-02-02T10:36:40Z [resource.labels.podName: rfs-conteo-prod-redis-7fc8844655-2qbcw] Readiness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task f2ca421fd292a37ed46ea1dab6ac5116b2820a8051078765b1a1004a438e4455 not found: not found
Expected behaviour
I thought that if the sentinel pod receives a SIGKILL, it would make it self "not ready" so that loadbalancer service stops sending requests.
Actual behaviour
My pod got stopped because of node scaling down (so pod got moved to a different node), and a request still went to that sentinel pod, which ended up in an error that "redis went away"
Environment
How are the pieces configured?
Logs
Container logs at the time of the event:
Nothing special I see there. Pod logs at the time of the event: