Open MAXEE998 opened 1 year ago
The problematic pod keeps trying to get rate limits from the shutdown peer according to the log:
rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 100.103.255.29:81: i/o timeout"
I don't run a k8s cluster, so I really don't have a way to test this. I rely on the community to provide support for k8s.
FYI, this isn't limited to k8s. We run on ECS and see something similar. These logs seem to coincide with our deployments.
time="2023-10-25T23:47:56Z"
level=error msg="error sending global hits to '10.0.37.143:9990'"
category=gubernator
error="Error in client.GetPeerRateLimits: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp 10.0.37.143:9990: connect: connection refused\""
I need to do some research on my end to see if it's a bug on our service or on this library.
We ran a three-replica gubernator setup in our k8s cluster. When one pod was shut down gracefully by K8s, another pod (not all, just one) kept reporting
in the log.
Apparently, it didn't update its peer list accordingly. What may be the cause of this problem?