fix(fetch): support `Content-Encoding` response header

mikicho commented 4 months ago

Fixes #446
Related to #575

Added failing test.

@kettanaito

At first, I thought to adopt Nock's implementation (inspired by undici), but IIRC, interceptors is cross-platform and AFAIK zlib does not exist in the browser. How does the browser decompress the response body? Am I missing something?
From what I see, IncomingMessage does not decompress the response body, and Node.js leaves this for the HTTP libraries (Axios, got, etc.). So, I think we should follow this behavior and leave the response body compressed.

Summary

We are not reusing Undici's decompression logic directly because that'd mean redistributing Undici alongside Interceptors.
We are using Compression Streams API to decompress GZIP and Deflate both in Node.js and the browser.
We are using zlib and a custom transform stream to support Brotli decompression in Node.js.
We are ignoring Brotli in the browser due to WASM issues.

kettanaito commented 4 months ago

Thanks, @mikicho!

I believe the answers to both of your questions are in setting up an HTTP server and taking a look what happens to compressed bodies sent from the server. We can request the same endpoint in the browser and in Node.js, and see whether they handle that at all.

But overall, yes, Interceptors ships browser interceptors, so we can't run things like zlib in FetchInterceptor. The mental model behind the request listeners (and MSW handlers by extension) is to act as a server route handler. If you need to pull an extra lib to decompress the request body in real life, you need to do the same with Interceptors. Since we are not a convenience library, we don't need to worry about anything outside of that.

kettanaito commented 4 months ago

Some related stuff I found:

https://github.com/nodejs/undici/issues/2260

kettanaito commented 4 months ago

Gave this a try in Node.js:

import fs from 'fs'
import http from 'http'
import zlib from 'zlib'

const stream = fs.createReadStream('./payload.txt')
const server = http.createServer((req, res) => {
  // Compress the response body.
  if (req.method === 'GET' && req.url === '/resource') {
    res.statusCode = 200
    res.writeHead(200, {
      'content-type': 'text/plain',
      'content-encoding': 'gzip',
    })
    stream.pipe(zlib.createGzip()).pipe(res)
    return
  }

  res.statusCode = 404
  res.end()
})

server.listen(3000, async () => {
  console.log('Server is running on http://localhost:3000')

  const response = await fetch('http://localhost:3000/resource', {
    headers: { 'Accept-Encoding': 'gzip' },
  })
  console.log(await response.text())
})

I receive the file's content decoded. I don't do the compression though.

mikicho commented 4 months ago

@kettanaito

I receive the file's content decoded. I don't do the compression though.

What do you mean? You do compress the response in the server

mikicho commented 4 months ago

@kettanaito I added an unmocked test which passes.

kettanaito commented 4 months ago

Thank you, @mikicho! Will look at this.

kettanaito commented 4 months ago

We do need to add Content-Encoding handling to the mocked responses in the fetch interceptor. Before proceeding, I've opened an issue in Undici to see if they would be willing to abstract their handling into a utility function we can consume directly: https://github.com/nodejs/undici/issues/3412. That would be the best case scenario.

If not, we can integrate their handling into Interceptors manually, although that'd be a repetition I'd rather not introduce.

kettanaito commented 1 month ago

Update

I realized we still need to duplicate the Undici's decompression handling because it's unlikely to be backported to Node v18 and even v20, at this point. I certainly don't have the capacity to see those backports through.

The handling itself is rather straightforward, it's just nice to be consistent. We will afford that nicety once Node.js v22 becomes the minimal supported version across MSW's toolchain.

@mikicho, if you have time, can you give me a hand with this? I've already abstracted the decompression logic in the Unidic's PR, making it work better with ReadableStream:

https://github.com/nodejs/undici/pull/3423/files#diff-e0a4e92ad66667607d700021123bf26260711f87fda60b44f6867a3f22087410R856

We basically need this decompress and the surrounding utilities. We can also merge it all in a single function, there's no big value keeping it separated.

kettanaito commented 1 month ago

We can also consider using the standard Compression Stream API instead of relying on Undici. I will explore this option.

kettanaito commented 1 month ago

Update

Opened a pull request that utilizes the Compression Streams API to decompress fetch responses: https://github.com/mswjs/interceptors/pull/661. This works for GZIP and Deflate, but has no Brotli support.

@mikicho, would be curious to hear your thoughts on this one.

Edit: Supporting Brotli seems like a lot of work. We can either release decoding without it (just GZIP and Deflate), and work on it in the background, or block this feature until we find a way to make Brotli happen in Node.js and the browser (it is possible via brotli-wasm, it just needs to be distributed properly).

mikicho commented 1 month ago

@kettanaito I think we should check the environment, and if running in Node.js, use zlib, otherwise use a browser-compatible library. WDYT?

kettanaito commented 1 month ago

@mikicho, that may be problematic. The environment check is fine, the module difference is problematic. We either need to ship two different FetchInterceptors for Node.js and the browser, or somehow hack a way to import different implementation of the decompression.

That's non-trivial, unfortunately, since we have to support both CJS and ESM at the moment. Dynamic requires will throw when compiling to ESM (imports must be static). We can give this a try.

The problem remains that the browser-compatible, web-stream-friendly Brotli decompression is non-existent right now. brotli-wasm is our best bet, if they agree to fix the export conditions and also bundle for Node.js ESM.

mikicho commented 1 month ago

@kettanaito I see. I see it is a bit hacky, but what about a different tsconfig with path alias?

I like this approach because it may solve other environment-specific code problems that may occur in the future in an elegant way.

kettanaito commented 1 month ago

I've finished the decompression support, please see the conclusion here: https://github.com/mswjs/interceptors/pull/661#issuecomment-2429045746.

Will merge once the tests are passing.

mikicho commented 1 month ago

@kettanaito LGTM

Just curious about PipelineStream. I'm not sure why we need this. Thanks for working on this!

kettanaito commented 1 month ago

@mikicho, I added PipelineStream as an alternative for pipeline in Node.js but for web streams. We use it to compose a dynamic list of TransformStream (decompressors) into a chain of streams to apply to the response body.

Do you see a better approach here?

mikicho commented 1 month ago

Ohh Ok! I see.. Not sure if I'd stick with Web API here (DecompressionStream) because AFAIU this code is used only by Node.js, but the current implementation is great too.

kettanaito commented 1 month ago

It's good to be consistent across the browser and Node.js, and the DecompressionStream is available in both of those environments. We are also working with ReadableStream, which is the response body. It's easier to convert zlib to a web stream than convert everything else to a Node.js stream.

mikicho commented 1 month ago

@kettanaito I think there is something with the decompression order.. still looking to understand the root cause (if any)

kettanaito commented 1 month ago

Right now, we're applying decompression left-to-right based on the content-encoding header:

https://github.com/mswjs/interceptors/pull/604/files#diff-8828d872f58df5de9fe0ddeacea2d0e5f24dc843916c68a2643769555d838694R47

Maybe that's the problem?

Basically:

content-encoding: gzip, br

The decompression chain would be: input -> DecompressionStream('gzip') -> BrotliDecompression -> output.

mikicho commented 1 month ago

This behavior looks correct; I'm not sure the code does it.

From what I see:

compressResponse(['gzip', 'br'])('hello world').toString('base64')); // H4sIAAAAAAAAA+NmbchIzcnJVyjPL8pJYQYAhavvgA8AAAA=

Produce a string that is br -> gzip, instead of gzip -> be

In contrary:

zlib.brotliCompressSync(zlib.gzipSync('hello world')).toString('base64'); // Cw+AH4sIAAAAAAAAA8tIzcnJVyjPL8pJAQCFEUoNCwAAAAM=

kettanaito commented 1 month ago

Your example is incorrect. compressResponse behaves similarly to how a server handles the Content-Encoding header. It lists all the encodings in the order they were applied, left-to-right. So, for your example, the order would be gzip, then brotli.

Decompression must be reversed, and there's a bug currently that I will fix soon.

`DecompresionStream` in Node.js v18-20

For some reason, the instances of DecompressionStream in Node.js v18-20 are printed as undefined when logging them out. They are still correct transform streams, just not observable in the console. Extremely confusing when debugging.

mikicho commented 1 month ago

IIRC, Content-Encoding header is "in the order they were applied", so the string is first compressed to gzip, and only then compressed to br, so the decompression should go backward.

Here's a fun find

I saw that too lol. Seems like 20.18.0 (latest) works as expected

mikicho commented 1 month ago

Your example is incorrect.

Why not? If I compressed by gzip, br, I expect to decompress by be -> gzip. Isn't it?

kettanaito commented 1 month ago

I think we mean different things by ->. I mean order, you seem to mean relation. Composition is easier:

Content-Encoding: gzip, br

br(gzip(data))

@mikicho, I've pushed the fix that applies the decompression right-to-left to respect the right order of multiple compressions. Mocked tests pass, but I have two bypass tests failing:

gzip, deflate
gzip, br

The error seems to be coming from Undici unable to handle that compression:

TypeError: terminated
 ❯ Fetch.emit ../node:events:518:28

Caused by: Error: incorrect header check
 ❯ Zlib.zlibOnError [as onerror] ../node:zlib:189:17

⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯⎯
Serialized Error: { errno: -3, code: 'Z_DATA_ERROR' }

Our interceptor is not involved here at all. Am I constructing an invalid compressed response with compressResponse()? 🤔 I've fixed that function, too, to handle correct order of compression.

I can confirm that compressResponse(['gzip', 'deflate'], 'hello world') returns the same buffer as zlib.deflateSync(zlib.gzipSync('hello world')):

decompressResponse: eJyT7+ZgAAPh0x5nT54M1zivf8qTkaFV0IuXGygMAICXB8E=
zlib: eJyT7+ZgAAPh0x5nT54M1zivf8qTkaFV0IuXGygMAICXB8E=

kettanaito commented 1 month ago

I fixed the compose function by removing it. The compose order is correct.

The issue is in Undici, I can reproduce it standalone as well. Will report it to their repo.

Edit: Opened an issue at https://github.com/nodejs/undici/issues/3762.

kettanaito commented 1 month ago

Since the behavior is reproducible in isolation, our side of the implementation seems to be done. The intention of bypass tests is to ensure the interceptor doesn't interfere with the regular behavior of the request client, but in this case the request client has a bug so we cannot allow that to influence our tests.

I will skip those two scenarios and link the issue above them. Will do that after we discuss the issue with the Undici folks.

kettanaito commented 1 month ago

This looks good to me. @mikicho, should we merge this?

kettanaito commented 1 month ago

🚀

kettanaito commented 1 month ago

Released: v0.36.6 🎉

This has been released in v0.36.6!

📄 Release notes
📦 npm package

Make sure to always update to the latest version (npm i @mswjs/interceptors@latest) to get the newest features and bug fixes.

Predictable release automation by @ossjs/release.

mswjs / interceptors