Closed Sniper91 closed 5 months ago
Thanks for your suggestion. That would make remote write a lot less efficient. It seems like something to tackle at the load balancer level, or the store proxies could itself reject transactions if it is overloaded.
You also specify 'when the number of requests are low', and then I would not expect the current behaviour to cause any issue in practice.
cc @cstyan @csmarchbanks
Release the Kraken lol
Hello,
Are you deploying a load balancer as your diagrams suggest? If so, this is likely an issue with the load balancer as the balancer should be routing the connections to the appropriate store proxies.
I have certainly seen this behavior when using a ClusterIP service in Kubernetes. I would not prefer periodic timeouts, as that is just a work around. I would consider adding client-side DNS load balancing if there is an easy to use library for it, similar to what you can do with gRPC. Otherwise, I would suggest using a proxy that already has some sort of client side load balancing built in.
@roidelapluie @csmarchbanks Thank you for your advice. store proxy is stateless. It will become very complicated if considering whether it is overloaded. we use hardware load balancer because we build storage as a service. Many clients are deployed outside the k8s cluster. I really agree with @csmarchbanks's idea.Client side load balancer is the best choice.But client could not connect to the proxy pod directly at present because client and proxy are in different VPCs.
@csmarchbanks well it turns out we have service discovery already, why would we reinvent something? could we reuse service discovery, like we have for alertmanagers, and reshard on targets change?
I would argue that using service discovery is reinventing the wheel as SRV records are commonly used for this already, including support for priorities. I can see how reusing that code could be nice though. Ideally we could find a package that does the work for us, e.g. https://pkg.go.dev/github.com/markdingo/cslb, but that library doesn't seem used by much.
If we used service discovery I imagine we would have to manually implement health checks/retries/etc, which we don't have to do for Alertmanager since we send to all instances in that case rather than a random one. I am not totally opposed, just bringing up a couple concerns I have with that approach :)
Also, @Sniper91 would you be ok with renaming this issue to supporting client side load balancing rather than resetting http connections periodically?
If we used service discovery I imagine we would have to manually implement health checks/retries/etc, which we don't have to do for Alertmanager since we send to all instances in that case rather than a random one. I am not totally op
I would argue to this that healthcheck is the responsibility of the service discovery (kubernetes, consul). we already handle retries if I am correct.
But for the rest I agree that it is a different usecase that we could manage differently.
I would argue to this that healthcheck is the responsibility of the service discovery (kubernetes, consul). we already handle retries if I am correct.
Healthcheck is not the responsibility of service discovery in Prometheus, in fact it's considered a problem if we're not returning targets merely because something is reporting them as unhealthy. Scraping should always try to scrape everything, especially if it's unhealthy/starting/shutting down.
So based on https://github.com/prometheus/prometheus/blob/df80dc4d3970121f2f76cba79050983ffb3cdbb0/notifier/notifier.go#L451 It appears that in alertmanager attempts to send to all and then succeeds if it is received on >=1 alertmanager. This seems fine for alert manager, trying to push this conversation along, would a simple round robin work?
I think it'd be nice to be able to configure a list of remote write targets (as we do with Alertmanager) along with a health-check endpoint and load balance through round robin (or whatever other method through a generic interface). Although we could use a library for all of this, in my opinion, it would be reasonable to implement it ourselves, which would have the added benefit of allowing for more advanced features such as taking account of targets' load in order to make routing decisions (just an idea). Also, I don't think there is really any out there that do everything we want.
I'd be willing to work on this :)
I think this is a niche use case where we should not invent new protocols etc. I am happy with the current retries process we have, and I would not add more complexity in the Prometheus implementation. if there are bugs in the retry process, we should however fix them.
I don't think that this necessarily requires any new or changed protocols. Health checks are as simple as an endpoint that returns 200
, which I probably in almost all remote write solutions (we don't have to standardize them). Taking account of targets load is a separate feature that could be used both in client side load balancing and other ways of sending data, and is mostly irrelevant to our consideration of this.
Although this might add more complexity to the innards of the Prometheus implementation, to a user it would be a simple as specifying the addresses of their remote write targets and a health check endpoint, nothing more (SD of targets could be considered later). As opposed to setting up a full-on load balancer this is one less service to manage and is easy and effective.
I still not sure what the use case is that we're wanting to support.
The problem that I would like to have a solution for is that new pods added to a service receiving remote write requests will not receive any requests right now as the connections to the old pods will be reused forever. This makes it very annoying to horizontally scale a remote write receiver. In some low request rate scenarios where only a single connection is needed all requests will end up going to a single pod even if you have multiple pods.
https://github.com/grpc/grpc/blob/master/doc/load-balancing.md is a good read, right now a user would have to follow the Proxy Model to get reasonable load balancing. I think it would be nice to support one of the other models, to avoid the inefficiency of proxies but they will add complexity to Prometheus.
If it is agreed that this is something we want, I would say we should start with simple round robin load balancing based on a list defined by service discovery. I wonder if that would get us pretty far without the added complication of health checks. Some requests would need to be retried if pointed to a failing server until it is removed from the service discovery implementation but they would be retried anyway. Alternatively we could try to just use DNS lookups based on the URL and send directly to the underlying IP addresses rather than service discovery.
I have prometheus and and my remote write collector in same namespace and cluster. I tried using Load Balancer type service but still sticky connection is there because GKE LoadBalancer service uses passthrough load balancer.
For alternative I am thinking of using internal load balancing using ingress and following this documentation.
Is this the right way because I just want in cluster communication of service?
Hello from the bug scrub. !
Since there is no movement here in over a year, and the problem can be solved using a proxy (e.g. Envoy, etc.), we will close this.
Relevant story: https://misterhex.github.io/Envoy-client-load-balancing/
Proposal
I deploy a remote-write adapter service before remote storage. Every prometheus instance writes local data to the adapter.But the load among different remote-write adapter pod is unbalanced especially when the number of requests are low. This happens because remote-write client uses persist connection and the connection keeps sending data to the specify pod once created. I propose to add a connection reseting interval option to remote write/read configuration and reset remote write/read connections periodically.