Constructing a Response with Content-Encoding?

kentonv commented 7 years ago

Section 4.6 step 16.1.2 specifies that Content-Encoding is handled (responses are decoded) before fetch() completes, so the returned Response already has a decompressed body (but response.headers.get('content-encoding') still returns the original encoding).

This seems to create an inconsistency. What happens if I want to construct a Response manually that contains gzipped content? Say, for example, that I wish to construct a Response for the purpose of returning from a FetchEvent handler in a ServiceWorker. If I say:

event.respondWith(new Response(data, {headers: {
  'Content-Type': 'text/plain',
  'Content-Encoding': 'gzip',
  'Content-Disposition': 'attachment; filename="file.txt.gz"'
}}));

What happens? I can imagine a few possibilities:

data should be uncompressed. The implementation will automatically compress it according to the Content-Encoding header, and the file downloaded will thus be compressed. (This is consistent, but weird, and it seemingly forces me to do a redundant decompress-compress round trip.)
data should be compressed, to be consistent with the header. The implementation will not modify the bytes when downloading. (But this is inconsistent with Responses that came from calling fetch()!)
This is an error. Specifying a Content-Encoding header here is incorrect. Consider Content-Type: application/gzip instead. (This seems generally unfortunate.)

(I'm posting this question against the fetch spec since it is where the Response class is specified, but the problem only seems to come up in the context of ServiceWorkers.)

kentonv commented 7 years ago

Hmm, it turns out that, to my surprise, browsers (Chrome at least) actually decompress files that have Content-Encoding when saving to disk (e.g. due to Content-Disposition). I always thought of Content-Encoding as being something that's not supposed to be handled until the data is to be consumed, unlike Transfer-Encoding which is clearly meant to be handled at the HTTP layer. It seems instead that browsers will automatically decode Content-Encoding in all cases I can think of.

I guess that this argues that, for the case of ServiceWorkers and Responses, my concern is moot. ServiceWorkers do not specify any means by which a Response can be sent back out on the network. It only exists within the browser, and within cache which is managed by the browser. So the Content-Encoding header from this point is just saying what encoding was used by the network transport (if the response ever crossed a network).

I'm working with a use case, though, where I'm trying to implement a proxy server which runs ServiceWorker-like code, so I actually need to figure out how to push a Response out over the network.

I suppose even for ServiceWorkers, the same question comes up with the Request object. What happens if I want to send a Request with a Content-Encoding: gzip body? Do I compressed the content before giving it to fetch()? Or does it do it for me?

kentonv commented 7 years ago

In my tests in Chrome, as expected, Request does not automatically encode the body. If you specify Content-Encoding: gzip, you must pass already-compressed data as the body option to new Request() (or to fetch()).

This seems to be what the spec calls for, but it's unfortunate that it means Request and Response are inconsistent with each other.

Thoughts?

annevk commented 7 years ago

So your main problem is with step 16 of https://fetch.spec.whatwg.org/#concept-http-network-fetch I suppose. That makes some of the things inconsistent here. We could provide dedicated support for gzip somehow, similarly to how it's tightly coupled in browsers, but that wouldn't necessarily address future schemes, if any.

One thing we could do is provide a request flag to disable step 16, but last I heard that would be fairly involved. And it's also fairly low-level and still makes you responsible for any compressing work.

kentonv commented 7 years ago

I think for now I will specify that Response bodies are expected to be decompressed, and if you specify a Content-Encoding on an outgoing Response, then the system is expected to apply the encoding for you when writing out to network.

Meanwhile, I can do clever optimizations where I can detect if a Response body passes through verbatim, and avoid the compression round trip in that case.

There are two annoyances:

If someone embeds a static asset in their code, they can't embed it already-compressed. It will be compressed on-the-fly every time it is sent. Or, similarly, if someone, say, connected upstream using a non-HTTP protocol (WebSocket, perhaps) and pulled down compressed assets, they wouldn't be able to send those out in a Response without recompressing. These use cases are probably rare, though, and we can potentially provide a non-standard API to cover them if needed. Maybe new Request() could take an additional init option called encodingAlreadyApplied or something.
As I mentioned above, handing of Content-Encoding in Requests is inconsistent with Responses. But setting Content-Encoding on a request is relatively rare.

wanderview commented 7 years ago

What is an "outgoing Response"? Do you mean Request?

kentonv commented 7 years ago

@wanderview No, I mean Response. I have a use case involving an HTTP proxy that applies ServiceWorker-like scripts, so Responses need to be serialized back out to the network.

wanderview commented 7 years ago

Ok, but that is a bit different from fetch API as specified for the browser. This seems like something you could fix in your proxy implementation.

kentonv commented 7 years ago

Yes, certainly in my use case I could deviate from the standard if needed. Just hoping to avoid it if at all possible. Less documentation to write that way. :)

And the Request / Response inconsistency strikes me as odd even in the browser case, though I suppose there's not much that can be done there at this point.

wanderview commented 7 years ago

Request.body could probably be changed to honor content-encoding in a best-effort way if it made sense.

lijunle commented 6 years ago

Hi, all.

I have a question on request encoding side. At this moment, there is no way to ask browser/runtime to encode the request content when calling fetch API. Is that right?

When handling logging, browser/runtime native gzip encoding is very useful.

annevk commented 6 years ago

@lijunle hey, that is correct. I think we best track that as a distinct request. Would you mind filing a new issue? Please also mention whether you just want gzip or also other formats, such as Brotli.

lijunle commented 6 years ago

@annevk Sure! Here it goes: https://github.com/whatwg/fetch/issues/653

determin1st commented 5 years ago

I bet, all those Content-Encoding parsing should be left to implementors and external to fetch apis - no auto-encoding-decoding stuff. Because now you have "broti", tommorow will be "shmroti", or, maybe some "aes128gsm" and also chains like "gzip, smzip, bunzip, gzip_again"...

whatwg / fetch

Constructing a Response with Content-Encoding? #589