spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.49k stars 356 forks source link

Sentinel cluster fails to come up on k8s 1.21.4 (1.21.6 works) #612

Closed davidtinker closed 11 months ago

davidtinker commented 1 year ago

Expected behaviour

Sentinels and Redis failovers start.

Actual behaviour

The Sentinels don't start properly. Interestingly it works on a 1.21.6 k8s cluster but not on 1.21.4.

The operator logs this error:

[redis-operator-7c4d99778d-jnj5h] time="2023-06-01T09:25:36Z" level=error msg="error on object processing: Service \"rfs-borage\" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.ipFamilies[0]: Invalid value: []core.IPFamily(nil): primary ipFamily can not be unset]" controller-id=redisfailover object-key=default/borage operator=redisfailover service=kooper.controller src="controller.go:279"

The Sentinels log this:

sentinel 1:X 01 Jun 2023 09:25:19.288 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo                                                                                                   
sentinel 1:X 01 Jun 2023 09:25:19.288 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started                                                                  
sentinel 1:X 01 Jun 2023 09:25:19.288 # Configuration loaded                                                                                                                            
sentinel 1:X 01 Jun 2023 09:25:19.289 * monotonic clock: POSIX clock_gettime                                                                                                            
sentinel 1:X 01 Jun 2023 09:25:19.289 * Running mode=sentinel, port=26379.                                                                                                              
sentinel 1:X 01 Jun 2023 09:25:19.290 # Sentinel ID is 9edbe61e2d3824ec1674077b6df6d78f96a07bac                                                                                         
sentinel 1:X 01 Jun 2023 09:25:19.290 # +monitor master mymaster 127.0.0.1 6379 quorum 2                                                                                                
sentinel 1:X 01 Jun 2023 09:25:20.334 # +sdown master mymaster 127.0.0.1 6379  

The service looks fine:

k get service rfs-borage -o yaml

apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2023-06-01T09:25:15Z"
  labels:
    app.kubernetes.io/component: sentinel
    app.kubernetes.io/managed-by: redis-operator
    app.kubernetes.io/name: borage
    app.kubernetes.io/part-of: redis-failover
    redisfailovers.databases.spotahome.com/name: borage
  name: rfs-borage
  namespace: default
  ownerReferences:
  - apiVersion: databases.spotahome.com/v1
    blockOwnerDeletion: true
    controller: true
    kind: RedisFailover
    name: borage
    uid: 989957b9-b316-4186-a253-ec28c419073a
  resourceVersion: "601515806"
  uid: 13ece87c-6abf-48d5-aea9-daef2e81be55
spec:
  clusterIP: 10.100.7.89
  clusterIPs:
  - 10.100.7.89
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - name: sentinel
    port: 26379
    protocol: TCP
    targetPort: 26379
  selector:
    app.kubernetes.io/component: sentinel
    app.kubernetes.io/name: borage
    app.kubernetes.io/part-of: redis-failover
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

Steps to reproduce the behaviour

cat <<EOF | kubectl apply -f -
apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: borage
spec:
  sentinel:
    replicas: 3
  redis:
    replicas: 3
    storage:
      keepAfterDeletion: true
      persistentVolumeClaim:
        metadata:
          name: 'redis'
        spec:
          storageClassName: 'openebs-hostpath'
          accessModes:
            - ReadWriteOnce
          resources:
            requests:
              storage: 500Mi
EOF

Environment

Logs

time="2023-06-01T09:42:04Z" level=info msg="Listening on :9710 for metrics exposure on URL /metrics" src="asm_amd64.s:1594"
time="2023-06-01T09:42:04Z" level=info msg="running in leader election mode, waiting to acquire leadership..." leader-election-id=redis-operator/redis-failover-lease operator=redisfailover source-service=kooper/leader-election src="controller.go:228"
I0601 09:42:04.523833       1 leaderelection.go:248] attempting to acquire leader lease redis-operator/redis-failover-lease...
I0601 09:42:38.373414       1 leaderelection.go:258] successfully acquired lease redis-operator/redis-failover-lease
time="2023-06-01T09:42:38Z" level=info msg="lead acquire, starting..." leader-election-id=redis-operator/redis-failover-lease operator=redisfailover source-service=kooper/leader-election src="asm_amd64.s:1594"
time="2023-06-01T09:42:38Z" level=info msg="starting controller" controller-id=redisfailover operator=redisfailover service=kooper.controller src="controller.go:229"
time="2023-06-01T09:42:38Z" level=error msg="error on object processing: Service \"rfs-borage\" is invalid: [spec.clusterIPs[0]: Invalid value: []string(nil): primary clusterIP can not be unset, spec.ipFamilies[0]: Invalid value: []core.IPFamily(nil): primary ipFamily can not be unset]" controller-id=redisfailover object-key=default/borage operator=redisfailover service=kooper.controller src="controller.go:279"
github-actions[bot] commented 11 months ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 11 months ago

This issue was closed because it has been inactive for 14 days since being marked as stale.