Open bsergean opened 4 years ago
I'm looking at the logs of the operator, and I see TONS of such errors.
{"level":"info","ts":1590868431.809033,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1590868492.4941754,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
W0530 19:55:01.976103 1 reflector.go:299] pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: watch of *v1.Job ended with: very short watch: pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: Unexpected watch close - watch lasted less than a second and no items received
W0530 19:55:39.122065 1 reflector.go:299] pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: watch of *v1alpha1.RedisClusterBackup ended with: very short watch: pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: Unexpected watch close - watch lasted less than a second and no items received
{"level":"info","ts":1590868552.9949102,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1590868613.3899803,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
W0530 19:57:45.137573 1 reflector.go:299] pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: watch of *unstructured.Unstructured ended with: very short watch: pkg/mod/k8s.io/client-go@v0.0.0-20191016111102-bec269661e48/tools/cache/reflector.go:96: Unexpected watch close - watch lasted less than a second and no items received
Before the FORGET error start, this error showed up:
{"level":"info","ts":1591115256.7091894,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115317.2034957,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115377.7051213,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115438.3920841,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115438.4025192,"logger":"controller_distributedrediscluster","msg":"waitPodReady","Request.Namespace":"cobra-bench","Request.Name":"redis","err":"CheckRedisNodeNum: redis pods are not all ready"}
{"level":"info","ts":1591115448.421646,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115448.4320798,"logger":"controller_distributedrediscluster","msg":"waitPodReady","Request.Namespace":"cobra-bench","Request.Name":"redis","err":"CheckRedisNodeNum: redis pods are not all ready"}
{"level":"info","ts":1591115458.4324322,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115458.443078,"logger":"controller_distributedrediscluster","msg":"waitPodReady","Request.Namespace":"cobra-bench","Request.Name":"redis","err":"CheckRedisNodeNum: redis pods are not all ready"}
{"level":"info","ts":1591115468.4434261,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115468.4537761,"logger":"controller_distributedrediscluster","msg":"waitPodReady","Request.Namespace":"cobra-bench","Request.Name":"redis","err":"CheckRedisNodeNum: redis pods are not all ready"}
{"level":"info","ts":1591115478.4540656,"logger":"controller_distributedrediscluster","msg":"Reconciling DistributedRedisCluster","Request.Namespace":"cobra-bench","Request.Name":"redis"}
{"level":"info","ts":1591115478.4647434,"logger":"controller_distributedrediscluster","msg":"waitPodReady","Request.Namespace":"cobra-bench","Request.Name":"redis","err":"CheckRedisNodeNum: redis pods are not all ready"}
Then on message about Forgetting failed node, this command might fail, this is not an error
{"level":"info","ts":1591115690.2306173,"logger":"controller_distributedrediscluster","msg":"[FixFailedNodes] Forgetting failed node, this command might fail, this is not an error","Request.Namespace":"cobra-bench","Request.Name":"redis","node":"6cb0b32912a84fd7c343e53f7523ac91b8a5f0c9"}
Then the usual
{"level":"info","ts":1591115690.2306411,"logger":"controller_distributedrediscluster","msg":"[FixFailedNodes] try to forget node","Request.Namespace":"cobra-bench","Request.Name":"redis","nodeId":"6cb0b32912a84fd7c343e53f7523ac91b8a5f0c9"}
{"level":"info","ts":1591115690.4067843,"logger":"controller_distributedrediscluster.redis_util","msg":"CLUSTER FORGET","Request.Namespace":"cobra-bench","Request.Name":"redis","id":"6cb0b32912a84fd7c343e53f7523ac91b8a5f0c9","from":"172.28.5.80:6379"}
{"level":"info","ts":1591115690.4896011,"logger":"controller_distributedrediscluster.redis_util","msg":"CLUSTER FORGET","Request.Namespace":"cobra-bench","Request.Name":"redis","id":"6cb0b32912a84fd7c343e53f7523ac91b8a5f0c9","from":"172.29.197.204:6379"}
Note that the log is enormous now, about 300M of that same error message.
When testing I rolled all the shard stateful sets one by one manually by the way of adding a nodeSelector
which the operator didn't pick up as a change to the CRD and this issue started happening too.
My cluster got in a bad state, it's unclear why.
There were 0 restarts:
But I see 4 master and slaves with CLUSTER NODES
Any ideas, guess on how to investigate this ?