Closed mmdaz closed 8 months ago
@ese @hoffoo @hashemi-soroush
I don't recall seeing any logic in the operator for initiating the change of master. Can you please share the logs of Redis and Sentinel instances and the operator?
old master redis logs:
1:M 14 Aug 2023 10:35:35.313 * Background saving terminated with success
1:M 14 Aug 2023 10:36:13.876 * Synchronization with replica 10.0.17.193:6379 succeeded
1:M 14 Aug 2023 10:36:14.135 * Synchronization with replica 10.0.11.124:6379 succeeded
1:M 14 Aug 2023 10:36:14.187 * Synchronization with replica 10.0.74.43:6379 succeeded
1:M 14 Aug 2023 10:36:14.427 * Synchronization with replica 10.0.83.38:6379 succeeded
1:M 14 Aug 2023 10:36:14.436 * Synchronization with replica 10.0.79.50:6379 succeeded
1:M 14 Aug 2023 10:36:14.445 * Synchronization with replica 10.0.104.200:6379 succeeded
1:M 14 Aug 2023 10:36:14.466 * Synchronization with replica 10.0.29.50:6379 succeeded
1:M 14 Aug 2023 10:36:14.485 * Synchronization with replica 10.0.62.111:6379 succeeded
1:M 14 Aug 2023 10:36:14.686 * Synchronization with replica 10.0.32.74:6379 succeeded
1:M 14 Aug 2023 10:51:25.252 # Connection with replica 10.0.74.43:6379 lost.
1:M 14 Aug 2023 10:51:25.672 # Connection with replica 10.0.29.50:6379 lost.
1:S 14 Aug 2023 10:51:35.747 # Connection with replica 10.0.79.50:6379 lost.
1:S 14 Aug 2023 10:51:35.747 # Connection with replica 10.0.104.200:6379 lost.
1:S 14 Aug 2023 10:51:35.747 # Connection with replica 10.0.32.74:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.11.124:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.17.193:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.62.111:6379 lost.
1:S 14 Aug 2023 10:51:35.748 # Connection with replica 10.0.83.38:6379 lost.
1:S 14 Aug 2023 10:51:35.748 * Before turning into a replica, using my own master parameters to synthesize a cached master: I may be able to synchronize with the new master with just a partial transfer.
1:S 14 Aug 2023 10:51:35.748 * REPLICAOF 10.0.74.43:6379 enabled (user request from 'id=77760 addr=10.0.47.71:48712 fd=11 name=sentinel-e09abc0d-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=4 qbuf=198 qbuf-free=32570 argv-mem=4 obl=45 oll=0 omem=0 tot-mem=61468 events=r cmd=exec user=default')
1:S 14 Aug 2023 10:51:35.748 # Could not create tmp config file (Read-only file system)
1:S 14 Aug 2023 10:51:35.748 # CONFIG REWRITE failed: Invalid argument
1:S 14 Aug 2023 10:51:36.012 * Connecting to MASTER 10.0.74.43:6379
1:S 14 Aug 2023 10:51:36.012 * MASTER <-> REPLICA sync started
operator log:
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:48:13Z" level=info msg="deployment updated" deployment=SENTINEL namespace=NS service=k8s.deployment src="deployment.go:109"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:48Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=REDIS src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:48Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=SENTINEL src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="configMap updated" configMap=SENTINEL namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="configMap updated" configMap=rfr-readiness-share-redis namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="configMap updated" configMap=REDIS namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=REDIS service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="statefulSet updated" namespace=NS service=k8s.statefulSet src="statefulset.go:108" statefulSet=REDIS
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=SENTINEL service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:50:49Z" level=info msg="deployment updated" deployment=SENTINEL namespace=NS service=k8s.deployment src="deployment.go:109"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:14Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=REDIS src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="service updated" namespace=NS service=k8s.service serviceName=SENTINEL src="service.go:99"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="configMap updated" configMap=SENTINEL namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="configMap updated" configMap=rfr-readiness-share-redis namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="configMap updated" configMap=REDIS namespace=NS service=k8s.configMap src="configmap.go:84"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=REDIS service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="statefulSet updated" namespace=NS service=k8s.statefulSet src="statefulset.go:108" statefulSet=REDIS
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="podDisruptionBudget updated" namespace=NS podDisruptionBudget=SENTINEL service=k8s.podDisruptionBudget src="poddisruptionbudget.go:85"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:15Z" level=info msg="deployment updated" deployment=SENTINEL namespace=NS service=k8s.deployment src="deployment.go:109"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:17Z" level=info msg="Update pod label, namespace: NS, pod name: REDIS-1, labels: map[redisfailovers-role:slave]" service=k8s.pod src="check.go:96"
redis-sentinel-redis-operator-5f59d5b9c4-drcvk redis-operator time="2023-08-14T10:53:17Z" level=info msg="Update pod label, namespace: NS, pod name: REDIS-5, labels: map[redisfailovers-role:master]" service=k8s.pod src="check.go:87"
sentinel logs:
1:X 14 Aug 2023 10:36:44.116 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:36:44.129 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:40:17.722 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:40:17.728 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:42:58.014 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:42:58.064 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:45:34.006 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:45:34.022 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:48:19.383 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:48:19.386 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:50:56.519 # +set master mymaster 10.0.45.248 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:50:56.531 # +set master mymaster 10.0.45.248 6379 failover-timeout 10000
1:X 14 Aug 2023 10:51:24.498 # +sdown master mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:24.571 # +new-epoch 156
1:X 14 Aug 2023 10:51:24.573 # +vote-for-leader d6df21606fde2f570b0ddef79a247325773db095 156
1:X 14 Aug 2023 10:51:24.581 # +odown master mymaster 10.0.45.248 6379 #quorum 3/2
1:X 14 Aug 2023 10:51:24.581 # Next failover delay: I will not start a failover before Mon Aug 14 10:51:44 2023
1:X 14 Aug 2023 10:51:24.636 # -sdown master mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:24.637 # -odown master mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:25.670 # +config-update-from sentinel d6df21606fde2f570b0ddef79a247325773db095 10.0.48.66 26379 @ mymaster 10.0.45.248 6379
1:X 14 Aug 2023 10:51:25.671 # +switch-master mymaster 10.0.45.248 6379 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.672 * +slave slave 10.0.17.193:6379 10.0.17.193 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.79.50:6379 10.0.79.50 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.83.38:6379 10.0.83.38 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.32.74:6379 10.0.32.74 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.11.124:6379 10.0.11.124 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.62.111:6379 10.0.62.111 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.104.200:6379 10.0.104.200 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.29.50:6379 10.0.29.50 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:25.679 * +slave slave 10.0.45.248:6379 10.0.45.248 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.83.38:6379 10.0.83.38 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.11.124:6379 10.0.11.124 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.104.200:6379 10.0.104.200 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.62.111:6379 10.0.62.111 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +convert-to-slave slave 10.0.45.248:6379 10.0.45.248 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.17.193:6379 10.0.17.193 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.750 * +fix-slave-config slave 10.0.32.74:6379 10.0.32.74 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:51:35.751 * +fix-slave-config slave 10.0.79.50:6379 10.0.79.50 6379 @ mymaster 10.0.74.43 6379
1:X 14 Aug 2023 10:53:21.467 # +set master mymaster 10.0.74.43 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:53:21.480 # +set master mymaster 10.0.74.43 6379 failover-timeout 10000
1:X 14 Aug 2023 10:55:59.318 # +set master mymaster 10.0.74.43 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:55:59.344 # +set master mymaster 10.0.74.43 6379 failover-timeout 10000
1:X 14 Aug 2023 10:58:25.800 # +set master mymaster 10.0.74.43 6379 down-after-milliseconds 5000
1:X 14 Aug 2023 10:58:25.867 # +set master mymaster 10.0.74.43 6379 failover-timeout 10000
1:X 14 Aug 2023 11:00:06.910 # +sdown master mymaster 10.0.74.43 6379
1:X 14 Aug 2023 11:00:07.034 # -sdown master mymaster 10.0.74.43 6379
I'm not sure what's going on here. Can you please share the Redisfailover CR and describe how I can reproduce it?
I'm not sure what's going on here. Can you please share the Redisfailover CR and describe how I can reproduce it?
This usually not totally happens when a lot of writing happens in Node Master and the memory usage suddenly increases In this graph, which is memory consumption, where the color has changed, the master has changed.
This issue is stale because it has been open for 45 days with no activity.
This issue was closed because it has been inactive for 14 days since being marked as stale.
Hey. I have a questsion. When the operator decides to change master? In one of our clusters, everything is ok and master is not going down, but it is changed many times in hour. Is there any config or doc about it?