oven-sh / bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one
https://bun.sh
Other
74.16k stars 2.77k forks source link

`deflateSync` produces binary that `inflateSync` fails to decompress #8886

Open MarByteBeep opened 9 months ago

MarByteBeep commented 9 months ago

What version of Bun is running?

1.0.26

What platform is your computer?

Microsoft Windows NT 10.0.22631.0 x64

What steps can reproduce the bug?

const buffer = new Uint8Array([0x12, 0x01, 0x3, 0x5, 0x5]);

{
    const compressed = Bun.deflateSync(buffer, { level: 9, windowBits: -15 });

    console.log(compressed);

    const decompressed = Bun.inflateSync(compressed);

    // decompressed isn't a Uint8array but an error: invalid stored block lengths
    console.log(decompressed);
}

{
    const compressed = Bun.deflateSync(buffer, { level: 9, windowBits: -15 });

    console.log(compressed);

    const decompressed = Bun.gunzipSync(compressed);

    // decompressed works fine
    console.log(decompressed);
}

What is the expected behavior?

A binary compressed with deflateSync should always be decompressible with inflateSync. When using the { level: 9, windowBits: -15 } settings, it is not.

However it IS decompressible with gunzipSync.

So we end up with the weird situation in which you have to use gunzipSync to decompress and deflateSync to compress.

What do you see instead?

decompressed is error: invalid stored block lengths

Additional information

As per discord discussion, might be related to https://github.com/oven-sh/bun/pull/8529

Also I ran into: https://github.com/oven-sh/bun/issues/6401 as well, the docs seem to be incorrect.

argosphil commented 9 months ago

You're passing a windowBits parameter value of -15.

From the zlib documentation at https://www.zlib.net/manual.html:

The windowBits parameter is the base two logarithm of the window size (the size of the history buffer). It should be in the range 8..15 for this version of the library. Larger values of this parameter result in better compression at the expense of memory usage. The default value is 15 if deflateInit is used instead.

Internally, a negative value of windowBits is used to switch between gzip and zlib compression modes. We should catch negative values in opts.windowBits and throw an error, but that's about all we can do.

MarByteBeep commented 9 months ago

That then creates a new problem: my original binary is compressed and starts with a78 DA header, which is a perfectly valid and common zlib header. I need to be able to recompress said data and end up with that header again. The only setting that did that was windowBits -15. How am I able to recreate the original if you simply prevent that value from being passed?

MarByteBeep commented 9 months ago

Also, any idea why inflateSync fails to decompress a zlib binary with said header, while gunzipSync works? That feels a bit counterintuitive to me 😊

argosphil commented 9 months ago

Internally, a negative value of windowBits is used to switch between gzip and zlib compression modes. We should catch negative values in opts.windowBits and throw an error, but that's about all we can do.

I was entirely and completely wrong. My apologies.

The actual problem is that inflateSync calls zlib like this:

        var reader = zlib.ZlibReaderArrayList.initWithOptions(compressed, &list, allocator, .{
            .windowBits = -15,
        }) catch |err| {

and you'd like to call it with .windowBits = 15. So this almost looks like my favorite kind of bugfix, those that involve removing or changing a single character.

But it's not: the existing code is right for the "raw deflate" format that zlib produces. Per the guidance of the zlib manual, we should default to expecting a zlib header, and assume it is absent only when explicitly told so. But right now, there's no option for that. We could simply check whether there's a valid zlib header, but that would still have many false positives.

So, technically, everything is as it should be, except for the function names. inflateSync should be inflateRawDeflatedDataSync, and used very rarely. Inflating zlib data and decompressing gzip files can safely share a function, since in those very rare cases where you want to accept only one of those formats, it's an acceptable burden to just check for the gzip magic number yourself.

Of course there's no way to guess that without improved API documentation.

Again, I'm sorry I was wrong. In summary, there are three compressed formats only two of which are known to be disjoint. Both (may) overlap with the third.

paperdave commented 2 months ago

ive found that this affects bun.report's ability to decompress some panic messages, though the node:zlib polyfill does not have this issue (but is slower).

in addition to this issue, since Bun 1.1.21, more payloads dont seem to be decompressable:

expect(Bun.gunzipSync(Buffer.from('eNrLzCvLz05NUUguSizOcKoMSMzLTNbQVMhIzEvJSS0CAK/LCxc', 'base64url')).toString('utf8')).toBe('invoked crashByPanic() handler');

above: