Remove content-encoding header when response is decoded

turley commented 10 years ago

I'm guessing others have different opinions on this, but I believe em-http-request should remove the Content-Encoding header when it decodes a response itself.

This allows em-http-request to operate more consistently with other http libraries (including Net::HTTP) when used in projects like Faraday with interchangeable adapters. It also fits the idea that content encoding should be "transparent" to the application - handled entirely at the protocol level.

As an example, this behavior is currently breaking the Google API Ruby Client (https://github.com/google/google-api-ruby-client/blob/master/lib/google/api_client/gzip.rb) when Faraday uses the em_synchrony adapter since the Google client assumes that if Content-Encoding is set to "gzip", it should decompress it there. Since em-http-request is returning a decompressed body and the "Content-Encoding: gzip" header, the Google API Client is failing when attempting to decompress the (already decompressed) body.

Of course, some may be of the opinion that the problem should be fixed higher up (in Faraday or the Google client in this case), but I feel like it's perfectly reasonable for a higher-level project to assume consistency between the Content-Encoding header and body data.

I'd love to hear other thoughts on this.

igrigorik commented 10 years ago

There are good arguments for and against making this change.. on balance I'm leaning against it.

The consistency between net/http is nice.
Conceptually, removing the header makes sense, since the body is decompressed.

That said, in practice, my experience with libraries like WinINet show that hiding headers is a bad idea in the long run -- makes building and debugging things on top of it hard, and some features impossible. I agree that the mismatch can be somewhat confusing, but em-http also exposes an easy API (:decoding => false) that allows you to skip the auto-decode (sounds like an update that needs to be made to Faraday). Also, on a practical side, a change like this would require a major version bump -- it runs the risk of breaking many people's code.

turley commented 10 years ago

I agree with the points made about not wanting to mess with response headers, especially if the requester didn't ask us to do that. Going further, my opinion is that if em-http-request didn't add the Accept-Encoding header (which I don't think it does), then it should have no part in decompressing just because it can by default. If hiding response headers without being asked is problematic, then I definitely feel like decompressing (without being explicitly asked) is also something we should avoid.

Take Faraday as an example (I realize other projects use em-http-request directly, but this illustrates the problem): we now have em-http-request decompressing, Faraday middleware decompressing, and some clients built on top of Faraday decompressing. Faraday middleware - being a "responsible citizen" - only attempts to decompress when the middleware is explicitly enabled (no surprises), and when it decompresses, it resets the Content-Length and Content-Encoding headers (https://github.com/lostisland/faraday_middleware/blob/master/lib/faraday_middleware/gzip.rb#L36-L37).

As I see it, this is the only safe way to do decompression; that is, only do it if you set the Accept-Encoding header; i.e. if you asked for it, then you can handle/decompress it. And "handling" it probably means also resetting those response headers so that the body and headers are consistent. Otherwise, we have all these layers trying to guess which other one asked for it and whether or not to decompress.

If someone at a higher level added the Accept-Encoding header, then em-http-request (in my view) should have no right to assume they want the response decompressed before it gets back to them - unless they deliberately enable decompression in em-http-request. By default, that higher-level requester most likely expects a compressed response to arrive at that level - especially if the Content-Encoding header is still something like "gzip" when it gets to that level. Heck, maybe they never intend to decompress at all. Requesting a compressed response with Accept-Encoding does not necessarily imply that it will (or should be) be decompressed.

In summary, em-http-request should probably make decompression off by default, and only enable decompression if the requester opts to turn it on.

As for requiring a major version bump for the change, that makes sense. Unfortunately, it sounds like many people rely on this (somewhat unintuitive) default behavior, so I wouldn't want to break it for them. Still, it should be noted that the current behavior is already breaking things for other people expecting it to work consistently (with respect to decompression) with other popular HTTP adapters.

igrigorik commented 8 years ago

@turley apologies for the slightly delayed response.. ^_^

You've raised good points. Thinking out loud, how about the following:

Change default behavior to add Accept-Encoding: gzip, deflate: in this day and age, it's the right default. This also means that we keep the default decoding behavior, which we can expose control over via new :compressed variable that's true by default.
- compressed: true + decoding: true will be the default.
- you can request compressed data without decoding.
- you can requested uncompressed data.
I'm still leaning towards keeping headers intact. In the case where we're layering middleware, the level that interacts with em-http can control how the data is requested and whether it wants to prune any headers before passing them upstream (auto-pruning in em-http breaks in opposite direction).

turley commented 8 years ago

That seems like a good way to handle it. I like the idea of exposing the options for those who care to change the default behavior.

Just FYI: the first place I encountered this issue (Google Cloud Client, mentioned earlier) has since added special documentation for use with Faraday (and em-http-request) to override the gzip middleware with a somewhat kludgy workaround: https://github.com/GoogleCloudPlatform/gcloud-ruby/pull/369

In cases like that, the compressed and decoding options would offer sufficient control to avoid the problem without such a workaround, though it would still require coding to the special case among alternative http libraries - at least when those other layers can't rely on the Content-Encoding header to decide.

It's your call (obviously) with regard to the response headers. I was just looking at net/http, and since Ruby 2, it adds Accept-Encoding: gzip ... to requests by default (and decompresses the response), but it also removes the Content-Encoding header in the response (I assume so other layers won't be confused). Of course, em-http-request doesn't have to follow that, but it might be worth considering.

igrigorik / em-http-request

Remove content-encoding header when response is decoded #270