spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.48k stars 355 forks source link

Does sentinel listen to SIGKILL? #686

Closed igoooor closed 3 months ago

igoooor commented 5 months ago

Expected behaviour

I thought that if the sentinel pod receives a SIGKILL, it would make it self "not ready" so that loadbalancer service stops sending requests.

Actual behaviour

My pod got stopped because of node scaling down (so pod got moved to a different node), and a request still went to that sentinel pod, which ended up in an error that "redis went away"

Environment

How are the pieces configured?

Logs

Container logs at the time of the event:

INFO 2024-02-02T10:34:45.033200246Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:34:45.033 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:34:45.038454080Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:34:45.038 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:15.528001493Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.527 # +set master mymaster 10.2.85.14 6379 down-after-milliseconds 1500
INFO 2024-02-02T10:35:15.532660946Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.532 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:15.533390024Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.532 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:35:15.538118999Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:15.537 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:45.424395751Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.424 # +set master mymaster 10.2.85.14 6379 down-after-milliseconds 1500
INFO 2024-02-02T10:35:45.430492822Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.430 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:35:45.430856163Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.430 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:35:45.435810640Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:35:45.435 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:36:15.728698109Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.727 # +set master mymaster 10.2.85.14 6379 down-after-milliseconds 1500
INFO 2024-02-02T10:36:15.733848405Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.733 * Sentinel new configuration saved on disk
INFO 2024-02-02T10:36:15.734494313Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.734 # +set master mymaster 10.2.85.14 6379 failover-timeout 3000
INFO 2024-02-02T10:36:15.740620714Z [resource.labels.containerName: sentinel] 1:X 02 Feb 2024 10:36:15.740 * Sentinel new configuration saved on disk

Nothing special I see there. Pod logs at the time of the event:

INFO 2024-02-02T10:36:38Z [resource.labels.podName: rfs-conteo-prod-redis-7fc8844655-2qbcw] deleting pod for node scale down
INFO 2024-02-02T10:36:39Z [resource.labels.podName: rfs-conteo-prod-redis-7fc8844655-2qbcw] Stopping container sentinel
WARNING 2024-02-02T10:36:40Z [resource.labels.podName: rfs-conteo-prod-redis-7fc8844655-2qbcw] Readiness probe errored: rpc error: code = NotFound desc = failed to exec in container: failed to load task: no running task found: task f2ca421fd292a37ed46ea1dab6ac5116b2820a8051078765b1a1004a438e4455 not found: not found
github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 3 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.