twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.79k stars 1.46k forks source link

How to set Host header with LoadBalancing #562

Open saint1991 opened 8 years ago

saint1991 commented 8 years ago

According to #560 you may be discussing but I got confused when using Finagle's load balancing feature. In order to distribute requests to multiple servers, we instantiate a service via Http.Client#newService as follows,

val client = Http.client.newService("hostA.org:80, hostB.org:80")

Then construct request,

val request = Request("/path/to/service")
request.host = ???

client(request)

What should we fill ??? with?

Moreover, we're forced to pass the host addresses twice when the service creation and the building request. It's not DRY.

Are there any reasons that the Host header automatically set that given to the Service?

bryce-anderson commented 8 years ago

I think that it would be nice for the load balancer to set the host header. We don't use the host header much, so this hasn't been a priority for us. That said, I think it would be really good to do as not everyone is so lax with their HTTP.

I'm sorry I don't have better news for you.

saint1991 commented 8 years ago

@bryce-anderson Thank you for your kind reply!

I opened this issue because HTTP/1.1 obligates us to set Host header. Some middleware, e.g. consul, don't accept the request without Host header and return the 400 BadRequest.

I hope this issue will be resolved in the future, thanks!

vigneshwaranr commented 7 years ago

Is this not resolved yet? It bit me :(

bryce-anderson commented 7 years ago

@vigneshwaranr, I'm sorry to say that we haven't invested in this yet.

mosesn commented 7 years ago

@vigneshwaranr would you like to take a stab at this? It might look something like having an option on the Http client that lets you set the host header for requests if it hasn't been set yet.

justinpermar commented 6 years ago

We can't use Finagle with AWS ALB unless this is fixed. It's a blocker for us. I can probably devote time to fixing this if anyone would be kind enough to point me in the right direction.

mosesn commented 6 years ago

@justinpermar try making a Stack module beneath the load balancer. The load balancer injects the address of the remote peer into Stack.Params, so you should be able to use that.

Also, can you elaborate on why you need this ability? Host headers typically refer to the virtual host, and since we typically load balance over remote peers in a cluster that are all the same, they typically share a virtual host.

justinpermar commented 6 years ago

@mosesn As noted above, HTTP 1.1 spec requires valid Host header. We're making requests with Finagle to AWS ALB, which rejects the requests with 400: https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html#http-400-issues.

mosesn commented 6 years ago

@justinpermar my question is why you need a different host header per endpoint. Is there a single virtual host which represents everything in your remote cluster? For example, if I talk to google.com, I always set my host header as google.com, even though I might choose a specific IP address to talk to.

I'm wondering if maybe you could set the host header yourself, instead of relying on finagle doing it on your behalf after load balancing.

justinpermar commented 6 years ago

@mosesn Ah I see what you're asking now. Yes, setting the host header is a workaround for our use case (I'm currently in the process of testing our application with a special-case configuration that does just that). But it's very hacky, since I have to set a host header manually for each request instead of letting Finagle just "do the right thing". As an aside, curl and other http tools set the host header, so it seems like a good idea that Finagle follows suit.

mosesn commented 6 years ago

@justinpermar yeah, the tricky thing is that curl and other HTTP tools can infer the host header because they assume you pass a URL. Finagle doesn't assume that, and indeed many users just resolve IP addresses from zookeeper (or similar). We could potentially add an HTTP-specific configuration method as a helper, something like Http.client.withHostHeader("twitter.com") but then the downside would be that if you dump the request before you pass it to the client, it doesn't actually reflect what it will look like when it's sent.

The RequestBuilder API is intended to address this problem, but it's pretty limited in what it checks, since HTTP is such a complicated protocol.

dreverri commented 6 years ago

Is there anything wrong with setting the host header via a filter?

case class WithHostHeader(host: String) extends SimpleFilter[Request, Response] {
  override def apply(request: Request, service: Service[Request, Response]): Future[Response] = {
    request.host = host
    service(request)
  }
}

WithHostHeader("localhost").andThen(Http.newService("localhost"))
mosesn commented 6 years ago

@dreverri no, that will work fine. This issue is specifically for when you want a different host header for different remote peers in the same loadbalancer.

taborda commented 6 years ago

Hi

It would be nice to have this feature.

@mosesn are you able to give more detail on this please?

try making a Stack module beneath the load balancer. The load balancer injects the address of the remote peer into Stack.Params, so you should be able to use that.

I would like to give it a try

Thanks

godsey commented 3 years ago

I found a way around this but it could get expensive. I setup cloudfront distribution with 1 sec min/max ttl pointed to the ALB. I then created a origin request lambda. This allows my backend to access the original Host header via X-Original-Host.

import json

def lambda_handler(event, context): event['Records'][0]['cf']['request']["origin"]["custom"]["customHeaders"]["x-original-host"] = [ { 'key': 'X-Original-Host', 'value': event['Records'][0]['cf']['request']["headers"]["host"][0]["value"], }, ] return event['Records'][0]['cf']['request']