spotahome / redis-operator

Redis Operator creates/configures/manages high availability redis with sentinel automatic failover atop Kubernetes.
Apache License 2.0
1.52k stars 363 forks source link

rfr-redisfailove Readiness probe failed #476

Closed seekelvis closed 1 year ago

seekelvis commented 2 years ago

Expected behaviour

The rfr-redisfailover are running.

Actual behaviour

The rfr-redisfailover are not running.

ubectl get all
NAME                                     READY   STATUS    RESTARTS   AGE
pod/redisoperator-78c9d88948-555wz       1/1     Running   0          95m
pod/rfr-redisfailover-0                  0/1     Running   0          61s
pod/rfr-redisfailover-1                  0/1     Running   0          61s
pod/rfr-redisfailover-2                  0/1     Running   0          61s
pod/rfs-redisfailover-6b8648d584-r6qtn   1/1     Running   0          61s
pod/rfs-redisfailover-6b8648d584-stjb5   1/1     Running   0          61s
pod/rfs-redisfailover-6b8648d584-zln7d   1/1     Running   0          61s

NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)     AGE
service/kubernetes          ClusterIP   10.96.0.1        <none>        443/TCP     110m
service/rfs-redisfailover   ClusterIP   10.107.189.183   <none>        26379/TCP   61s

NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/redisoperator       1/1     1            1           95m
deployment.apps/rfs-redisfailover   3/3     3            3           61s

NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/redisoperator-78c9d88948       1         1         1       95m
replicaset.apps/rfs-redisfailover-6b8648d584   3         3         3       61s

NAME                                 READY   AGE
statefulset.apps/rfr-redisfailover   0/3     61s

Steps to reproduce the behaviour

kubectl apply -f basic.yaml

apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
  name: redisfailover
spec:
  sentinel:
    replicas: 3
    resources:
      requests:
        cpu: 100m
      limits:
        memory: 100Mi
  redis:
    replicas: 3
    resources:
      requests:
        cpu: 100m
        memory: 100Mi
      limits:
        cpu: 400m
        memory: 500Mi
  auth:
    secretPath: redis-auth

Environment

How are the pieces configured?

Logs

kubectl describe pod/rfr-redisfailover-0

...
Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  2m3s               default-scheduler  Successfully assigned default/rfr-redisfailover-0 to node01
  Normal   Pulling    2m1s               kubelet            Pulling image "redis:6.2.6-alpine"
  Normal   Pulled     115s               kubelet            Successfully pulled image "redis:6.2.6-alpine" in 5.807746669s
  Normal   Created    115s               kubelet            Created container redis
  Normal   Started    115s               kubelet            Started container redis
  Warning  Unhealthy  2s (x10 over 83s)  kubelet            Readiness probe failed:

kubectl logs pod/rfr-redisfailover-0

1:C 15 Sep 2022 17:10:05.475 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:C 15 Sep 2022 17:10:05.475 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:C 15 Sep 2022 17:10:05.475 # Configuration loaded
1:S 15 Sep 2022 17:10:05.475 * monotonic clock: POSIX clock_gettime
1:S 15 Sep 2022 17:10:05.476 * Running mode=standalone, port=6379.
1:S 15 Sep 2022 17:10:05.476 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:S 15 Sep 2022 17:10:05.476 # Server initialized
1:S 15 Sep 2022 17:10:05.476 * Ready to accept connections
1:S 15 Sep 2022 17:10:05.477 * Connecting to MASTER 127.0.0.1:6379
1:S 15 Sep 2022 17:10:05.477 * MASTER <-> REPLICA sync started
1:S 15 Sep 2022 17:10:05.477 * Non blocking connect for SYNC fired the event.
1:S 15 Sep 2022 17:10:05.477 * Master replied to PING, replication can continue...
1:S 15 Sep 2022 17:10:05.478 * Partial resynchronization not possible (no cached master)
1:S 15 Sep 2022 17:10:05.478 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
1:S 15 Sep 2022 17:10:06.480 * Connecting to MASTER 127.0.0.1:6379
1:S 15 Sep 2022 17:10:06.480 * MASTER <-> REPLICA sync started
1:S 15 Sep 2022 17:10:06.480 * Non blocking connect for SYNC fired the event.
1:S 15 Sep 2022 17:10:06.480 * Master replied to PING, replication can continue...
1:S 15 Sep 2022 17:10:06.480 * Partial resynchronization not possible (no cached master)
1:S 15 Sep 2022 17:10:06.480 * Master is currently unable to PSYNC but should be in the future: -NOMASTERLINK Can't SYNC while not connected with my master
...

kubectl logs rfs-redisfailover-6b8648d584-r6qtn

1:X 15 Sep 2022 17:10:08.499 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
1:X 15 Sep 2022 17:10:08.499 # Redis version=6.2.6, bits=64, commit=00000000, modified=0, pid=1, just started
1:X 15 Sep 2022 17:10:08.499 # Configuration loaded
1:X 15 Sep 2022 17:10:08.499 * monotonic clock: POSIX clock_gettime
1:X 15 Sep 2022 17:10:08.500 * Running mode=sentinel, port=26379.
1:X 15 Sep 2022 17:10:08.500 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
1:X 15 Sep 2022 17:10:08.510 # Sentinel ID is f5036ba023ce94ab259588a3d84e8997af7d835e
1:X 15 Sep 2022 17:10:08.510 # +monitor master mymaster 127.0.0.1 6379 quorum 2
1:X 15 Sep 2022 17:10:09.512 # +sdown master mymaster 127.0.0.1 6379
seekelvis commented 2 years ago

It seems all the rfr-redisfailover are slave now.

seekelvis commented 2 years ago

The issue is similar to #412 . But I still cannot found the solution.

ese commented 2 years ago

Could you paste if there is some relevant log in the operator? redisoperator-78c9d88948-555wz

seekelvis commented 2 years ago
time="2022-09-15T15:35:53Z" level=info msg="Listening on :9710 for metrics exposure" src="asm_amd64.s:1581"
time="2022-09-15T15:35:53Z" level=info msg="starting controller" controller-id=redisfailover operator=redisfailover service=kooper.controller src="controller.go:233"
time="2022-09-15T15:36:26Z" level=info msg="service created" namespace=default service=k8s.service serviceName=rfs-redisfailover src="service.go:61"
time="2022-09-15T15:36:27Z" level=info msg="configMap created" configMap=rfs-redisfailover namespace=default service=k8s.configMap src="configmap.go:68"
time="2022-09-15T15:36:27Z" level=info msg="configMap created" configMap=rfr-s-redisfailover namespace=default service=k8s.configMap src="configmap.go:68"
time="2022-09-15T15:36:27Z" level=info msg="configMap created" configMap=rfr-readiness-redisfailover namespace=default service=k8s.configMap src="configmap.go:68"
time="2022-09-15T15:36:27Z" level=info msg="configMap created" configMap=rfr-redisfailover namespace=default service=k8s.configMap src="configmap.go:68"
W0915 15:36:27.146989       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0915 15:36:27.151321       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
time="2022-09-15T15:36:27Z" level=info msg="podDisruptionBudget created" namespace=default podDisruptionBudget=rfr-redisfailover service=k8s.podDisruptionBudget src="poddisruptionbudget.go:69"
time="2022-09-15T15:36:27Z" level=info msg="statefulSet created" namespace=default service=k8s.statefulSet src="statefulset.go:92" statefulSet=rfr-redisfailover
W0915 15:36:27.344713       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0915 15:36:27.347547       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
time="2022-09-15T15:36:27Z" level=info msg="podDisruptionBudget created" namespace=default podDisruptionBudget=rfs-redisfailover service=k8s.podDisruptionBudget src="poddisruptionbudget.go:69"
time="2022-09-15T15:36:27Z" level=info msg="deployment created" deployment=rfs-redisfailover namespace=default service=k8s.deployment src="deployment.go:92"
time="2022-09-15T15:36:53Z" level=info msg="configMap updated" configMap=rfs-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
time="2022-09-15T15:36:53Z" level=info msg="configMap updated" configMap=rfr-s-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
time="2022-09-15T15:36:53Z" level=info msg="configMap updated" configMap=rfr-readiness-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
time="2022-09-15T15:36:53Z" level=info msg="configMap updated" configMap=rfr-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
W0915 15:36:53.606013       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0915 15:36:53.610646       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
time="2022-09-15T15:36:53Z" level=info msg="podDisruptionBudget updated" namespace=default podDisruptionBudget=rfr-redisfailover service=k8s.podDisruptionBudget src="poddisruptionbudget.go:79"
time="2022-09-15T15:36:53Z" level=info msg="statefulSet updated" namespace=default service=k8s.statefulSet src="statefulset.go:102" statefulSet=rfr-redisfailover
W0915 15:36:53.618265       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0915 15:36:53.620573       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
time="2022-09-15T15:36:53Z" level=info msg="podDisruptionBudget updated" namespace=default podDisruptionBudget=rfs-redisfailover service=k8s.podDisruptionBudget src="poddisruptionbudget.go:79"
time="2022-09-15T15:36:53Z" level=info msg="deployment updated" deployment=rfs-redisfailover namespace=default service=k8s.deployment src="deployment.go:102"
time="2022-09-15T15:36:58Z" level=error msg="error on object processing: dial tcp 192.168.140.66:6379: i/o timeout" controller-id=redisfailover object-key=default/redisfailover operator=redisfailover service=kooper.controller src="controller.go:279"
time="2022-09-15T15:37:23Z" level=info msg="configMap updated" configMap=rfs-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
time="2022-09-15T15:37:23Z" level=info msg="configMap updated" configMap=rfr-s-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
time="2022-09-15T15:37:23Z" level=info msg="configMap updated" configMap=rfr-readiness-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
time="2022-09-15T15:37:23Z" level=info msg="configMap updated" configMap=rfr-redisfailover namespace=default service=k8s.configMap src="configmap.go:78"
W0915 15:37:23.656240       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0915 15:37:23.658631       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
time="2022-09-15T15:37:23Z" level=info msg="podDisruptionBudget updated" namespace=default podDisruptionBudget=rfr-redisfailover service=k8s.podDisruptionBudget src="poddisruptionbudget.go:79"
time="2022-09-15T15:37:23Z" level=info msg="statefulSet updated" namespace=default service=k8s.statefulSet src="statefulset.go:102" statefulSet=rfr-redisfailover
W0915 15:37:23.665736       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
W0915 15:37:23.744248       1 warnings.go:70] policy/v1beta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
time="2022-09-15T15:37:23Z" level=info msg="podDisruptionBudget updated" namespace=default podDisruptionBudget=rfs-redisfailover service=k8s.podDisruptionBudget src="poddisruptionbudget.go:79"
time="2022-09-15T15:37:23Z" level=info msg="deployment updated" deployment=rfs-redisfailover namespace=default service=k8s.deployment src="deployment.go:102"
time="2022-09-15T15:37:28Z" level=error msg="error on object processing: dial tcp 192.168.140.66:6379: i/o timeout" controller-id=redisfailover object-key=default/redisfailover operator=redisfailover service=kooper.controller src="controller.go:279

@ese

ese commented 2 years ago

It seems redis-operator cannot connect to redis instance to configure it

time="2022-09-15T15:37:28Z" level=error msg="error on object processing: dial tcp 192.168.140.66:6379: i/o timeout" controller-id=redisfailover object-key=default/redisfailover operator=redisfailover service=kooper.controller src="controller.go:279

Redis instances bootstrap as slave of their self until Redis-operator takes control to configure the cluster. Readiness probe will fail until they are configured by redis-operator

What kind of Kubernetes deploy are you using? GKE, kind, kops,..? What CNI are you using? Have you any network policy in place?

marcbachmann commented 2 years ago

I also ran into this issue because of a network policy. Maybe the status could be reported back onto the RedisFailover resource for better transparency. Currently the timeout is also configured at 30s. Maybe decreasing that makes it easier visible. A 10s timeout would be good enough already.

time="2022-10-11T10:47:29Z" level=error msg="error on object processing: dial tcp 10.42.0.66:6379: i/o timeout" controller-id=redisfailover object-key=somenamespace/redis operator=redisfailover service=kooper.controller src="controller.go:279"
TMInnovations commented 2 years ago

I have the same problem. rfr-service doesn't get deployed by the operator and thus, isn't found => which logs errors

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 45 days with no activity.

github-actions[bot] commented 1 year ago

This issue was closed because it has been inactive for 14 days since being marked as stale.

sanscfs commented 1 year ago

bump

meltingrock commented 10 months ago

Same issue.