spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.49k stars 356 forks source link

When is master changed? #641

Closed mmdaz closed 8 months ago

mmdaz commented 11 months ago

Hey. I have a questsion. When the operator decides to change master? In one of our clusters, everything is ok and master is not going down, but it is changed many times in hour. Is there any config or doc about it?

mmdaz commented 10 months ago

@ese @hoffoo @hashemi-soroush

hashemi-soroush commented 10 months ago

I don't recall seeing any logic in the operator for initiating the change of master. Can you please share the logs of Redis and Sentinel instances and the operator?

mmdaz commented 10 months ago

old master redis logs:

1:M 14 Aug 2023 10:35:35.313 * Background saving terminated with success
1:M 14 Aug 2023 10:36:13.876 * Synchronization with replica 10.0.17.193:6379 succeeded
1:M 14 Aug 2023 10:36:14.135 * Synchronization with replica 10.0.11.124:6379 succeeded
1:M 14 Aug 2023 10:36:14.187 * Synchronization with replica 10.0.74.43:6379 succeeded
1:M 14 Aug 2023 10:36:14.427 * Synchronization with replica 10.0.83.38:6379 succeeded
1:M 14 Aug 2023 10:36:14.436 * Synchronization with replica 10.0.79.50:6379 succeeded
1:M 14 Aug 2023 10:36:14.445 * Synchronization with replica 10.0.104.200:6379 succeeded
1:M 14 Aug 2023 10:36:14.466 * Synchronization with replica 10.0.29.50:6379 succeeded
1:M 14 Aug 2023 10:36:14.485 * Synchronization with replica 10.0.62.111:6379 succeeded
1:M 14 Aug 2023 10:36:14.686 * Synchronization with replica 10.0.32.74:6379 succeeded
1:M 14 Aug 2023 10:51:25.252 # Connection with replica 10.0.74.43:6379 lost.
1:M 14 Aug 2023 10:51:25.672 # Connection with replica 10.0.29.50:6379 lost.
1:S 14 Aug 2023 10:51:35.747 # Connection with replica 10.0.79.50:6379 lost.
1:S 14 Aug 2023 10:51:35.747 # Connection with replica 10.0.104.200:6379 lost.
1:S 14 Aug 2023 10:51:35.747 # Connection with replica 10.0.32.74:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.11.124:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.17.193:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.62.111:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.83.38:6379 lost.
1:S 14 Aug 2023 10:51:35.748 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 14 Aug 2023 10:51:35.748 * REPLICAOF 10.0.74.43:6379 enabled (user request from 'id=77760 addr=10.0.47.71:48712 fd=11 name=sentinel-e09abc0d-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=198 qbuf-free=32570 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default')
1:S 14 Aug 2023 10:51:35.748 # Could not create tmp config file (Read-only file system)
1:S 14 Aug 2023 10:51:35.748 # CONFIG REWRITE failed: Invalid argument
1:S 14 Aug 2023 10:51:36.012 * Connecting to MASTER 10.0.74.43:6379
1:S 14 Aug 2023 10:51:36.012 * MASTER <-> REPLICA sync started

operator log:

redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:48:13Z" level=info msg="deployment updated" deployment=SENTINEL namespace=NS service=k8s.deployment src="deployment.go:109"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:48Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=REDIS src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:48Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=SENTINEL src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="configMap updated" configMap=SENTINEL namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="configMap updated" configMap=rfr-readiness-share-redis namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="configMap updated" configMap=REDIS namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=REDIS service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="statefulSet updated" namespace=NS service=k8s.statefulSet src="statefulset.go:108" statefulSet=REDIS
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=SENTINEL service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="deployment updated" deployment=SENTINEL namespace=NS service=k8s.deployment src="deployment.go:109"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:14Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=REDIS src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=SENTINEL src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="configMap updated" configMap=SENTINEL namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="configMap updated" configMap=rfr-readiness-share-redis namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="configMap updated" configMap=REDIS namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=REDIS service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="statefulSet updated" namespace=NS service=k8s.statefulSet src="statefulset.go:108" statefulSet=REDIS
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=SENTINEL service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="deployment updated" deployment=SENTINEL namespace=NS service=k8s.deployment src="deployment.go:109"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:17Z" level=info msg="Update pod label, namespace: NS, pod name: REDIS-1, labels: map[redisfailovers-role:slave]" service=k8s.pod src="check.go:96"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:17Z" level=info msg="Update pod label, namespace: NS, pod name: REDIS-5, labels: map[redisfailovers-role:master]" service=k8s.pod src="check.go:87"

sentinel logs:

1:X 14 Aug 2023 10:36:44.116 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:36:44.129 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:40:17.722 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:40:17.728 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:42:58.014 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:42:58.064 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:45:34.006 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:45:34.022 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:48:19.383 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:48:19.386 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:50:56.519 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:50:56.531 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:51:24.498 # +sdown master mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:24.571 # +new-epoch 156
1:X 14 Aug 2023 10:51:24.573 # +vote-for-leader d6df21606fde2f570b0ddef79a247325773db095 156
1:X 14 Aug 2023 10:51:24.581 # +odown master mymaster 10.0.45.248 6379 #quorum 3/2
1:X 14 Aug 2023 10:51:24.581 # Next failover delay: I will not start a failover before Mon Aug 14 10:51:44 2023
1:X 14 Aug 2023 10:51:24.636 # -sdown master mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:24.637 # -odown master mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:25.670 # +config-update-from sentinel d6df21606fde2f570b0ddef79a247325773db095 10.0.48.66 26379 @ mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:25.671 # +switch-master mymaster 10.0.45.248 6379 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.672 * +slave slave 10.0.17.193:6379 10.0.17.193 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.79.50:6379 10.0.79.50 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.83.38:6379 10.0.83.38 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.32.74:6379 10.0.32.74 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.11.124:6379 10.0.11.124 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.62.111:6379 10.0.62.111 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.104.200:6379 10.0.104.200 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.29.50:6379 10.0.29.50 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.45.248:6379 10.0.45.248 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.83.38:6379 10.0.83.38 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.11.124:6379 10.0.11.124 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.104.200:6379 10.0.104.200 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.62.111:6379 10.0.62.111 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +convert-to-slave slave 10.0.45.248:6379 10.0.45.248 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.17.193:6379 10.0.17.193 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.32.74:6379 10.0.32.74 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.751 * +fix-slave-config slave 10.0.79.50:6379 10.0.79.50 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:53:21.467 # +set master mymaster 10.0.74.43 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:53:21.480 # +set master mymaster 10.0.74.43 6379 failover-timeout 10000
1:X 14 Aug 2023 10:55:59.318 # +set master mymaster 10.0.74.43 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:55:59.344 # +set master mymaster 10.0.74.43 6379 failover-timeout 10000
1:X 14 Aug 2023 10:58:25.800 # +set master mymaster 10.0.74.43 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:58:25.867 # +set master mymaster 10.0.74.43 6379 failover-timeout 10000
1:X 14 Aug 2023 11:00:06.910 # +sdown master mymaster 10.0.74.43 6379
1:X 14 Aug 2023 11:00:07.034 # -sdown master mymaster 10.0.74.43 6379
hashemi-soroush commented 10 months ago

I'm not sure what's going on here. Can you please share the Redisfailover CR and describe how I can reproduce it?

mmdaz commented 10 months ago

I'm not sure what's going on here. Can you please share the Redisfailover CR and describe how I can reproduce it?

This usually not totally happens when a lot of writing happens in Node Master and the memory usage suddenly increases In this graph, which is memory consumption, where the color has changed, the master has changed.

Screenshot from 2023-08-20 10-17-51

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 8 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.