palantir / conjure-java-runtime

Opinionated libraries for HTTP&JSON-based RPC using Dialogue, Feign, OkHttp as clients and Jetty/Jersey as servers
Apache License 2.0
80 stars 95 forks source link

http-remoting does not load balance requests #536

Closed sirstevepal closed 6 years ago

sirstevepal commented 7 years ago

Have noticed that http-remoting is not balancing requests across the list of nodes that is passed in. At the moment it seems it will only failover to other nodes on a failed request.

Are there plans to make this balance requests across all nodes in the list?

uschi2000 commented 7 years ago

It's not so obvious that this is generally the desired behavior (at least with out standard networking setups today). For example, some services cache (partial) responses internally; thus, randomizing nodes could yield cache misses on many code paths.

uschi2000 commented 7 years ago

More abstractly, a certain notion of sticky sessions may be desired even when nodes are semantically stateless.

uschi2000 commented 7 years ago

And yet I agree that there are other services for which per-call randomization is desired :/

sirstevepal commented 7 years ago

Yeah I think the main concern is that I want to be able to scale services horizontally and at the moment that's not quite achievable.

uschi2000 commented 7 years ago

Different consumers will use different target URLs, but every single consumer keeps hitting the same node until it becomes unavailable. Again, there’s a tradeoff between always randomizing and allowing for a certain type of locality, and I’m not convinced we have empirically explored what this tradeoff looks like in different use-cases.

From: sirstevepal [mailto:notifications@github.com] Sent: Monday, September 11, 2017 9:17 AM To: palantir/http-remoting http-remoting@noreply.github.com Cc: Robert Fink rfink@palantir.com; Comment comment@noreply.github.com Subject: Re: [palantir/http-remoting] http-remoting does not load balance requests (#536)

Yeah I think the main concern is that I want to be able to scale services horizontally and at the moment that's not quite achievable.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub[github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_palantir_http-2Dremoting_issues_536-23issuecomment-2D328580651&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=UfcWCaR4ui50AFap-gezrx5XYtPtH-9JpazU7tbRW-4&m=_P8yW5ldldIeutVKSWBm25s-LDQVnNIAALmQeCvOsGc&s=5qduDsOJKWna_MTHeA7sNcXmozDdHsRUbeAoS-hmG5c&e=, or mute the thread[github.com]https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_notifications_unsubscribe-2Dauth_AGOdwQA4mU3PFgUNT4VckwusC5jTwuvZks5shV0OgaJpZM4POtlJ&d=DwMCaQ&c=izlc9mHr637UR4lpLEZLFFS3Vn2UXBrZ4tFb6oOnmz8&r=UfcWCaR4ui50AFap-gezrx5XYtPtH-9JpazU7tbRW-4&m=_P8yW5ldldIeutVKSWBm25s-LDQVnNIAALmQeCvOsGc&s=0j25VYWJQgeYb-Ped6bXLcNxJOtvC5pmYKIvjMY3QfU&e=.

schlosna commented 6 years ago

Think we'll need to do some work here as in most cases we probably want even load distribution over sticky sessions for cache optimizations (and I'd posit we should push those cache optimizations down a level so they're shared, but out of scope for this issue).

@chrisalice for SA

alicederyn commented 6 years ago

Presumably we'd want to reuse the same node for the same user (since that's likely to increase cache hits) and randomise nodes between users (since that distributes load better) -- can we leverage the On-Behalf-Of work for this? (This is one of the strategies good load-balancing proxies use.)

uschi2000 commented 6 years ago

I don't think we're going to implement this in http-remoting. More complex routing logic should be handled in the proxy mesh layer.