Describe the bug
We got an ingester instance that cannot be started, we found logs as follow:
msg="found an existing instance(s) with a problem in the ring, this instance cannot become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance [10.0.40.84:9095](http://10.0.40.84:9095/) past heartbeat timeout"
"10.0.40.84:9095" was an short-lived ingester instances, it looks like it was not removed from the ring, a very short line can be seen in the image below around "14:53".
The last log of ingester "10.0.40.84:9095" is as follows:
caller=memberlist_client.go:899 msg="skipped broadcasting CAS update because memberlist KV is shutting down" key=collectors/ringShow context caller=module_service.go:114 msg="module stopped" module=ring caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=ingester caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=ingester
To Reproduce
Steps to reproduce the behavior:
Started Loki with multiple ingesters.
Add a new ingester and delete it soon.
Expected behavior
When an ingester is removed, the ring should updates its records and other nodes do not fail to start.
Environment:
Infrastructure: Kubernetes(AWS EKS 1.29)
Deployment tool: helm
Screenshots, Promtail config, or terminal output
If applicable, add any output to help explain your problem.
Describe the bug We got an ingester instance that cannot be started, we found logs as follow:
msg="found an existing instance(s) with a problem in the ring, this instance cannot become ready until this problem is resolved. The /ring http endpoint on the distributor (or single binary) provides visibility into the ring." ring=ingester err="instance [10.0.40.84:9095](http://10.0.40.84:9095/) past heartbeat timeout"
"10.0.40.84:9095" was an short-lived ingester instances, it looks like it was not removed from the ring, a very short line can be seen in the image below around "14:53".
The last log of ingester "10.0.40.84:9095" is as follows:
caller=memberlist_client.go:899 msg="skipped broadcasting CAS update because memberlist KV is shutting down" key=collectors/ringShow context caller=module_service.go:114 msg="module stopped" module=ring caller=lifecycler.go:416 msg="auto-joining cluster after timeout" ring=ingester caller=lifecycler.go:576 msg="instance not found in ring, adding with no tokens" ring=ingester
To Reproduce Steps to reproduce the behavior:
Expected behavior When an ingester is removed, the ring should updates its records and other nodes do not fail to start.
Environment:
Screenshots, Promtail config, or terminal output If applicable, add any output to help explain your problem.