twitter / finagle

A fault tolerant, protocol-agnostic RPC system
https://twitter.github.io/finagle
Apache License 2.0
8.79k stars 1.46k forks source link

Finagle headers blocked by AWS CloudFront's new Cache/Origin Policies #869

Open tbcd opened 4 years ago

tbcd commented 4 years ago

I've filed a support case with AWS, but just a heads-up, and to save anyone else out there searching for answers...

AWS CloudFront launched a new Cache/Origin Policies feature a few weeks ago. While legacy policies can still be used for now, the new policy features are selected by default for new distributions.

The regression is that they completely reject all client requests containing HTTP header-names with a period (as 400 Bad Request). There's currently no workaround for this other than to switch back to a legacy policy. Affects both HTTP and HTTPS requests.

Discovered this when Twitterbot was failing to read meta-tags for Twitter Cards. Narrowed it down to a Finagle-Ctx-com.twitter.finagle.Retries: 0 HTTP header that Twitterbot was only including in HTTP requests, not HTTPS requests. The Twitter Card Validator helpfully says INFO: Page fetched successfully despite it getting a 400 Bad Request....

Why would anyone care about HTTP in this case? Because Twitter uses an http:// protocol to decide how to render URLs that don't specify a protocol.

eg. Typing example.com into a tweet will look for Twitter Card meta-tags on http://example.com. With AWS CloudFront's new Cache/Origin Policies, this request is rejected, so Twitterbot never receives a 301 redirecting to https://example.com, and no card is displayed.

So this isn't technically a bug in Finagle, but those headers sure are ugly. Either way, hopefully CloudFront will address this.

For what it's worth, AWS Elastic Load Balancer does not reject the requests, but appeared to be silently removing these headers from the requests. I have only briefly tested ELB with them, however. (Edit: Re-tested ELB with packet capture, and it is passing through all headers un-modified, so this part can be ignored.)

I'll post a report on the Twitter Developer forums as well.

bryce-anderson commented 4 years ago

Thanks for the heads up. Seems like a curious rule.

hexawyz commented 4 years ago

My team stumbled on this problem – or at least very similar one – very recently, while sending requests to a server that we don't own. (Using Finagle version 19.9.0)

After reading the code, and a bit of help from my coworker, I came up with this workaround to remove undesired headers from Finagle requests, but it feels a bit brittle:

Http.client
    .withStack(_.remove(Stack.Role("ClientContext")).remove(TraceInitializerFilter.role))

We are working around two issues here:

  1. Though Finagle is not entirely at fault here (it is allowed by spec), the use of dots in header names is quite unnatural, and likely to cause trouble with overzealous HTTP servers. (As was the case here) I'm more interested in a fix to the second issue, but I guess it would be nice if you addressed this one, too 🙂
  2. Finagle should provide out of the box a very simple option to opt-out (or opt-in?) of finagle custom HTTP headers. Because we may need to contact "random" servers on the internet (for example, think of a web crawler), it is very important that any HTTP API we use let us control the headers that are sent to external servers. This is for reasons as simple as increased anonymity or "not breaking" remote servers. Finagle not letting us do that easily seems like a very arbitrary choice, and I hope this can be improved to avoid future occurences of this problem.

Could you implement a mechanism to easily disable finagle custom headers and/or officially document the correct way to do it?

mosesn commented 4 years ago

@GoldenCrystal I don't think we have the bandwidth to implement that right now, but we'd be happy to guide you through building it, if you do!