spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.49k stars 357 forks source link

Redis pods are not coming up when sentinel replics do not match redis replicas #589

Closed kmcrawford closed 8 months ago

kmcrawford commented 1 year ago

Expected behaviour

When setting sentinel.replics to 3 and redis.replicas to 5 the cluster should come up.

Actual behaviour

Everything is created but the redis pods are stuck in a "NotReady" state.

NAME                         READY   STATUS    RESTARTS   AGE
rfr-redis-0                  0/1     Running   0          2m22s
rfr-redis-1                  0/1     Running   0          2m22s
rfr-redis-2                  0/1     Running   0          2m22s
rfr-redis-3                  0/1     Running   0          2m22s
rfr-redis-4                  0/1     Running   0          2m22s
rfs-redis-54cdf79767-b6jc6   1/1     Running   0          2m22s
rfs-redis-54cdf79767-ltwjp   1/1     Running   0          2m22s
rfs-redis-54cdf79767-nfr7r   1/1     Running   0          2m22s

Steps to reproduce the behaviour

Using the following config I get this error:

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: redis
  namespace: identity-redis-testing
spec:
  sentinel:
    replicas: 3                              # Setting this to the same number as redis replicas fixes the issue
    resources:                               
      requests:
        cpu: 100m
      limits:
        memory: 500Mi
  redis:
    replicas: 5                              # Setting this to the same number as sentinel replicas fixes the issue
    resources:                            
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        memory: 500Mi
    customConfig:
      - "maxmemory 450mb"
      - "maxmemory-policy volatile-ttl"
    storage:
      keepAfterDeletion: false
      persistentVolumeClaim:
        metadata:
          name: redis-data
        spec:
          storageClassName: standard
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 10Gi

Environment

Operator Chart Version: redis-operator-3.2.8 App Version: 1.2.4 GKE Kubernetes version: v1.25.7:

Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.7-gke.1000", GitCommit:"0a719a43f6f42e7e4cc9f696a5a6a416da0b229e", GitTreeState:"clean", BuildDate:"2023-03-14T10:47:42Z", GoVersion:"go1.19.6 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}

Logs

This is the only log in the operator:

time="2023-05-01T14:03:41Z" level=warning msg="set annotations,resize nothing" namespace=identity-redis-testing pvc=redis service=k8s.statefulSet src="client.go:102"

rfr-redis-0:

1:C 01 May 2023 14:03:48.852 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 01 May 2023 14:03:48.852 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 01 May 2023 14:03:48.852 # Configuration loaded
1:S 01 May 2023 14:03:48.852 * monotonic clock: POSIX clock_gettime
1:S 01 May 2023 14:03:48.853 * Running mode=standalone, port=6379.
1:S 01 May 2023 14:03:48.853 # Server initialized
1:S 01 May 2023 14:03:48.853 * Ready to accept connections
1:S 01 May 2023 14:03:48.854 * Connecting to MASTER 127.0.0.1:6379
1:S 01 May 2023 14:03:48.854 * MASTER <-> REPLICA sync started
1:S 01 May 2023 14:03:48.854 * Non blocking connect for SYNC fired the event.
1:S 01 May 2023 14:03:48.854 * Master replied to PING, replication can continue...
1:S 01 May 2023 14:03:48.854 * Partial resynchronization not possible (no cached master)
1:S 01 May 2023 14:03:48.854 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 01 May 2023 14:03:49.857 * Connecting to MASTER 127.0.0.1:6379
1:S 01 May 2023 14:03:49.857 * MASTER <-> REPLICA sync started
1:S 01 May 2023 14:03:49.857 * Non blocking connect for SYNC fired the event.
1:S 01 May 2023 14:03:49.857 * Master replied to PING, replication can continue...
1:S 01 May 2023 14:03:49.857 * Partial resynchronization not possible (no cached master)
1:S 01 May 2023 14:03:49.857 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
....

One of the sentinel pods:

Defaulted container "sentinel" out of: sentinel, sentinel-config-copy (init)
1:X 01 May 2023 14:03:41.028 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 01 May 2023 14:03:41.028 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 01 May 2023 14:03:41.028 # Configuration loaded
1:X 01 May 2023 14:03:41.028 * monotonic clock: POSIX clock_gettime
1:X 01 May 2023 14:03:41.029 * Running mode=sentinel, port=26379.
1:X 01 May 2023 14:03:41.034 # Sentinel ID is 894d7ab411ba92124ca7e07634df05f8f17ec74a
1:X 01 May 2023 14:03:41.034 # +monitor master mymaster 127.0.0.1 6379 quorum 2
1:X 01 May 2023 14:03:42.031 # +sdown master mymaster 127.0.0.1 6379

FYI: @jlcrow

kmcrawford commented 1 year ago

I see the issue, we are running build 1.2.4 which was built on 12/28/2022. I found a code fix for this issue on 1/18/2023 https://github.com/spotahome/redis-operator/commit/c4f0369cbddc7e55ec0328ab2fb916eda8cdb94f

This code fix is not in v1.2.4, and makes sense why we are seeing this issue.

Can there be a new version of the image & chart pushed that includes this fix? @bwrobc @ese

Prudhvi0717 commented 1 year ago

This issue still persists even after updating my redis operator to 3.2.8 which had the fix for this issue @bwrobc @ese

Prudhvi0717 commented 1 year ago

So the changes are merged to master but the latest image is published 5 months back. So this issue will still persist right??

kmcrawford commented 1 year ago

A new image hasn’t been pushed since it was fixed.

kmcrawford commented 1 year ago

This issue still persists even after updating my redis operator to 3.2.8 which had the fix for this issue @bwrobc @ese

Version of the chart 3.28 still uses v1.2.4

github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 45 days with no activity.

kmcrawford commented 10 months ago

This shouldn't be stale, there is a issue with code that has been fixed but unreleased.

github-actions[bot] commented 9 months ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 8 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

JulesdeCube commented 7 months ago

This shouldn't be stale, there is a issue with code that has been fixed but unreleased.