thanos-io / thanos

Highly available Prometheus setup with long term storage capabilities. A CNCF Incubating project.
https://thanos.io
Apache License 2.0
13.12k stars 2.1k forks source link

querier: Add option for different gRPC client Loadbalancing options for StoreAPI #1083

Open bwplotka opened 5 years ago

bwplotka commented 5 years ago

Currently if someone pass domain to --store of via file SD without dns loopups, the target is passed to gRPC DialContext which then if A lookup gives more than one response it loadbalance. However the default seems to be pickfirst not roundrobin, which might be confusing for some cases, like e.g smoothing the traffic to multiple replicas of store gateways.

AC:

bwplotka commented 5 years ago

cc @devnev @mjd95

povilasv commented 5 years ago

IMO we shouldnt really do this. I had a chat with grpc folks about 30min dns refresh interval hardcoded into go grpc package and they said PRs changing that arent welcome. And it doesnt make sense for them to implement all of the LB strategies in all the languages. ATM its easier to run envoy, nginx, etc to do this. In future they are working on grpc lb server

povilasv commented 5 years ago

This is the snippet from our chat:

Povilas Versockas @povilasv Mar 12 12:56
Hey, guys I've been looking at GRPC loadbalancing on Kubernetes and I would like to go with go-grpc client looking up DNS records and doing round robin loadbalancing. The only thing worries me is that right now the default DNS refresh interval is 30 mins (https://github.com/grpc/grpc-go/blob/master/resolver/dns/dns_resolver.go#L47) and there is no way to change it. Would you guys bee open for a PR, which makes this configureable? I was thinking to add option WithRefreshInterval(time.Duration) DialOption. Or do you guys generally recommend proxy solutions (like envoy) doing load balancing in K8s?

Eric Anderson @ejona86 Mar 12 17:20
@povilasv, please no. Even the current interval would like to be removed. Instead, use KeepaliveParams with MaxConnectionAge. When the client receives disconnects, it will refresh DNS.

Povilas Versockas @povilasv Mar 12 19:20
@ejona86 but what about the case where I scale up my replicas? in this case clients will never know about new servers for 30 mins?
IMO in ideal scenario grpc LB pool should be refreshed based on DNS record's TTL, no?

Eric Anderson @ejona86 Mar 12 20:05
@povilasv, you set the MaxConnectionAge as the longest you are willing to wait for the clients to rebalance. It works for scaling up.
TTL causes trouble because 1) we can't get it many times and 2) it causes stampeding herds when caching DNS servers are in play.
Note that most existing HTTP-based systems do not base anything off TTL. Instead, they just create new connections frequently.

Edgar Romero @alfasapy Mar 12 20:10
@ejona86 what are the drawbacks of a very short maxConnectionAge? Assuming that the maxConnectionAgeGrace is high enough to prevent in-flight request from being cancelled.

Eric Anderson @ejona86 Mar 12 20:12
It just creates more connections. At O(seconds) and definitely O(minutes) the connection cost is already amortized. There will be "hickups" of more latency when creating new connections.

Povilas Versockas @povilasv Mar 12 21:23
ok makes sense, thank you :)
I feel like this should be documented better, is there a place I could do that?

Eric Anderson @ejona86 Mar 12 22:17
It's not clear to me where that place would be.

Povilas Versockas @povilasv Mar 12 22:22
ok, thanks for the help, I really appreciate it
povilasv commented 5 years ago

IMO pickfirst is pretty good LB as default, Grafana -> Thanos Query -> Thanos Store becomes sticky per user, meaning if grafana refreshes and send the query it will be in cache already.

bwplotka commented 5 years ago

I don't think it is good as it does not evenly loadbalance the load. At least we might want to add option of changing this for first iteration..

Wonder if some more customized LB makes sense then? That is aware of client context?

povilasv commented 5 years ago

The only proper way to evenly balance the load and don't deal with upscale issues is to run sidecar proxies.

The only other option is to add maxConnectionAge to Thanos Store, Sidecar and Rule GRPC servers, which is super hacky and I wouldn't like that.

AFAIK the grpc lookaside loadbalancer, where grpc client would actually connect to lookaside server get it's routes from there and keep refreshing them is still in development

https://grpc.io/blog/loadbalancing/

So we don't really have any good options currently, a part from asking users to use sidecar proxies.

povilasv commented 5 years ago

hmm maybe it's not that bad we could do something like on all of our grpc servers:

    grpcServer := grpc.NewServer(
        grpc.UnaryInterceptor(grpc_middleware.ChainUnaryServer(
            grpc_prometheus.UnaryServerInterceptor,
            grpc_recovery.UnaryServerInterceptor(),
            grpc_logrus.UnaryServerInterceptor(log.NewEntry(log.StandardLogger())),
        )),
        grpc.KeepaliveParams(keepalive.ServerParameters{
            MaxConnectionIdle:     10 * time.Minute,
            MaxConnectionAge:      5 * time.Minute,
            MaxConnectionAgeGrace: 1 * time.Minute,
            Time:                  2 * time.Minute,
        }),
    )
bwplotka commented 5 years ago

So I don't really understand the problem you are mainly trying to explain here. I don't think any dns refresh is needed. As you got answer is all about the broken connection and grpc client doing DNS lookup and choosing another host. Why is that problematic?

However the actual issue here is not about that. It's about loadbalancing method we can use. In fact we already use one (you have to use something), which is pickfirst. I vote we should have round robin available and maybe some sticky round robin as well that is aware of client. I am pretty sure there are existing Golang implementation of this on the wild (:

bwplotka commented 5 years ago

OK, so I talked with @povilasv offline and the issue with RR is:

upscale if you do round robin gprc client wont see the new node

(https://kubernetes.io/blog/2018/11/07/grpc-load-balancing-on-kubernetes-without-tears/)

Maybe it's right. I would then suggest using something like this: https://github.com/improbable-eng/kedge/blob/9a6745fb10e47ef4da101bb8e1cafd329ca397a7/pkg/kedge/http/lbtransport/policy.go#L60 It's battle tested on our prod already.

povilasv commented 5 years ago

@bwplotka Maybe you guys could seperate it out into a different go package?

As importing kedge doesn't seem reasonable :dagger: A part from that :+1:

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bwplotka commented 4 years ago

We can indeed. Let's get back to this. (:

stale[bot] commented 4 years ago

Hello 👋 Looks like there was no activity on this issue for last 30 days. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity for next week, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 4 years ago

Closing for now as promised, let us know if you need this to be reopened! 🤗

stale[bot] commented 4 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

bwplotka commented 4 years ago

Very needed, potentially blocking https://github.com/thanos-io/thanos/issues/3373

Also again mentioned here https://cloud-native.slack.com/archives/CK5RSSC10/p1604923594031700

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

kakkoyun commented 3 years ago

Still valid.

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 3 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

stale[bot] commented 3 years ago

Closing for now as promised, let us know if you need this to be reopened! 🤗

aymericDD commented 3 years ago

I think it is still interesting.

stale[bot] commented 2 years ago

Hello 👋 Looks like there was no activity on this issue for the last two months. Do you mind updating us on the status? Is this still reproducible or needed? If yes, just comment on this PR or push a commit. Thanks! 🤗 If there will be no activity in the next two weeks, this issue will be closed (we can always reopen an issue if we need!). Alternatively, use remind command if you wish to be reminded at some point in future.

yeya24 commented 7 months ago

Is this still relevant after we have endpoint group?