httptoolkit / brotli-wasm

A reliable compressor and decompressor for Brotli, supporting node & browsers via wasm
Apache License 2.0
264 stars 21 forks source link

Leaking memory when used streaming #23

Closed william00179 closed 1 year ago

william00179 commented 1 year ago

Hi there!

I have an application using multiple web-workers to compress files and upload them and after switching to brotli-wasm we started to see browser crashes due to out of memory errors. Usually we will be uploading a few thousand files at a time, but we are seeing crashes after just 100 or so files. Tracking down the memory usage, we can see that for each web worker, brotli-wasm is using an excessive amount of memory (and it keeps growing until it crashes)

image

This seems to come from the native WASM memory mapped between the WASM and JS memory.

I tried using the code provided in your readme but found that the files would be corrupted so have implemented the stream like

const compressStream = new brotli.CompressStream(3);
const compressionStream = new TransformStream({
  start() {},
  transform(chunk, controller) {
    let first = true;
    do {
      controller.enqueue(
        compressStream.compress(
          first ? chunk : chunk.slice(compressStream.last_input_offset()),
          1024 ** 2
        )
      );
      first = false;
    } while (
      compressStream.result() ===
      brotli.BrotliStreamResult.NeedsMoreOutput
    );
  },
  flush(controller) {
    if (
      compressStream.result() === brotli.BrotliStreamResult.NeedsMoreInput
    ) {
      controller.enqueue(compressStream.compress(undefined, 1024 ** 2));
    }
    controller.terminate();
  },
});

Which works very well until the memory runs out.

I've not been able to locate a cause for this memory leak except for something going wrong in the rust code itself or the wasm wrapper

pimterry commented 1 year ago

Hmmm, I'm really not sure what could cause this I'm afraid @william00179. Streaming support was an extra feature added by @myl7 in #11, and I don't actually use it myself - I built brotli-wasm purely to use the direct compress/decompress API really.

If in your investigations you do find more clues towards exactly what's causing this, then do please share more info, and of course I'd be very happy to look at a PR if you can find a fix.

pimterry commented 1 year ago

By the way, this has just nudged me into properly looking into the README example for web streams, and I found a few issues there that cause problems once you start using these for non-trivial examples.

I've now updated that in the README - you might want to test out the new version, since I think it resolves the issue your modified version handles (compression running out of output) more simply (by using the compression stream's existing internal buffer), and fixes some other bugs, which might plausibly resolve the memory issues you're seeing.

william00179 commented 1 year ago

Hi @pimterry thanks for having a look at that so quickly.

Unfortunately, I'm still seeing the same behavior when using that new code.

After uploading < 100 images each at about 512KB each the process is almost using over 3GB of memory. The problem is made worse because each worker leaks the memory so it compounds the issue quickly as it will use the number of cores available minus one.

In the case below I uploaded 100 images, each being exactly 513.92KiB.

Each worker will receive an image to upload in a round robin fashion so each worker should have only received approx 4662 KiB of uncompressed data. Yet we see almost 400MB of shared memory being used here between WASM and JS on just this one worker VM.

image
pimterry commented 1 year ago

Do you see the same behaviour if you use the non-streaming APIs?

I think this issue is very likely to be a detail of the streaming API specifically, but it'd be good to confirm that so we can at least be precise about where the problem is.

Unfortunately, beyond that debugging this is going to require some in-depth Rust knowledge, and I'm not sure I can help much! PRs to fix this are very welcome if you can trace down the issue.

myl7 commented 1 year ago

The flush part should check NeedsMoreOutput other than NeedsMoreInput, since there is already no input. https://github.com/httptoolkit/brotli-wasm/blob/b5c095da32a2c646b6e8c1771d83f437c1777e35/src/stream.rs#L144-L149 shows the logic in Rust. When flushing (chunk input is null / undefined), only NeedsMoreOutput or ResultSuccess are returned.

As for the code in the issue, the above means the stream is terminated without actually flushed, so the left input buffer causes the memory leak. I am not completely sure about that so maybe you need to recheck if the below code can work. I think the TransformStream should be:

const compressStream = new brotli.CompressStream()
const compressionStream = new TransformStream({
  start() {},
  transform(chunk, controller) {
    do {
      controller.enqueue(compressStream.compress(chunk.slice(compressStream.last_input_offset()), 1024 ** 2))
    } while (compressStream.result() === brotli.BrotliStreamResult.NeedsMoreOutput)
  },
  flush(controller) {
    do {
      compressStream.compress(undefined, 1024 ** 2)
    } while (compressStream.result() === brotli.BrotliStreamResult.NeedsMoreOutput)
    controller.terminate()
  },
})

@pimterry Sorry I did not ship an example of streaming in the PR, and that seems to make many users confused. If the above TransformStream code is tested fine, do you like to add it to README or, which I would prefer more, let me add an example folder with playable example projects?

pimterry commented 1 year ago

Thanks @myl7! @william00179 if you can confirm that that works that'd be very helpful.

@myl7 regarding examples - yes, I think creating an examples folder and putting working streaming examples in there would be very helpful! If you have some time, a PR for that would be great (once we've confirmed that the code above now works correctly).

william00179 commented 1 year ago

Hi @pimterry and @myl7,

From testing this example above, when trying to decompress the brotli encoded files, the decompression fails with an unexpected EOF.