pierrec / node-lz4

LZ4 fast compression algorithm for NodeJS
MIT License
438 stars 98 forks source link

Bad Compression / Decompression using Pure JS version #12

Closed NathanaelA closed 10 years ago

NathanaelA commented 10 years ago

<Buffer 7b 22 61 64 64 72 65 73 73 22 3a 22 31 32 37 2e 30 2e 30 2e 31 22 7d> (Actual 23 bytes of Data is: {"address":"127.0.0.1"} )

var lz4 = require('./lz4/lib/binding'), fs = require('fs'); var data = fs.readFileSync('packet-test.txt'); var len = data.length;
var out = new Buffer(len+4096); var inF = new Buffer(len);

var sz = lz4.compress(data ,out); lz4.uncompress(out, inF); res = data.toString('hex') === inF.toString('hex'); if (!res) console.log("Failed to Compress/Decompress");

This is a small packet; but in my testbed I have files sized from 2 bytes to 48m. Out of the 526 files in my compression testbed; 109 of them fail to compress<->decompress. Some of them end up like this one at 0k (compression) and some of them actually are "bigger" than the source (hence the + 4096 on my out buffer) -- this is using the current pull as of today (3/24/2014)

NathanaelA commented 10 years ago

Erg -- it ate the raw buffer line: Buffer: 7b 22 61 64 64 72 65 73 73 22 3a 22 31 32 37 2e 30 2e 30 2e 31 22 7d

pierrec commented 10 years ago

Hello,

When using the binding.compress and binding.uncompress functions, you use the "raw" compress/decompress functions (also called block functions), which are used by higher level ones such as decode() and encode(). They behave exactly like their C counterpart and are exposed as they are useful in some cases, but you have run into their main problem: you need to do more work after using them!

When using them, you need to check for their output. In the case of lz4.compress():

Note that the compressed data can indeed be bigger than the uncompressed one, again when the data is not compressible. So in order to initialize your output buffer you should use lz4.compressBound(data.length), which will return the maximum size the compressed buffer can be based on the size of the input data.

See examples/file_compressBlock.js and examples/compress.html for examples of how to use the block functions.

As a final note, buffer sizes less than 13 bytes are not compressed as per the LZ4 specs.

Let me know if your 109 failed cases fit into those!

Pierre

NathanaelA commented 10 years ago

Thanks for the feedback.

I basically need to test the Browser version of the JS (no native modules) -- so since the browser build/lz4.js version doesn't seem to work at all on node; I used the binding version since it is what your benchmark uses. (Also Trying to include the lz4 appears to require xxhash which is I guess supposed to be a compiled module of some sort since it fails to load because I didn't compile anything)

Binding.compressBound always appears to return Zero; so I'm still just hacking the output compression buffer size to be 4096 chars larger than the source size (which eliminates any error messages from binding.compress/.decompress)

Based on your feedback out of the 109 failures 8 of them were Size=0, so 101 were actual (de)compression failures.

Grabbing one of the smaller files that failed; it is 88 chars long, compressed to 53, decompressed to 92 chars, and of course several characters are corrupted in the buffer.

To Test: grab everything between the Quotes. "R0lGODlhDAAMAIAAAGZmZv///yH5BAEAAAEALAAAAAAMAAwAAAIYjI8BmbBsHIwPSsXuPbrSj3QRKIrKYl4FADs="

pierrec commented 10 years ago

Thanks for pointing this out, there was indeed a bug in the JS compressBlock(), please try out v0.3.2.

I am unsure why the JS compressBound() always returns 0 though. Could you give me a real example please?

NathanaelA commented 10 years ago

Awesome, that fixed the de/compression issues. All files that can be compressed are being compressed/decompressed properly.

As for the js.CompressBound -- I pass in any of the data buffer from any of my test files to it and it just returns 0.

var lz4 = require('./lz4/lib/binding'), fs = require('fs');
var data = fs.readFileSync('packet-test.txt');

var compSize = lz.compressBound(data);
// compSize === 0 at this point no matter what data is
pierrec commented 10 years ago

Glad to hear that it fixes your issues :).

As for the compressBound() function, you need to pass in the buffer length, not the buffer itself.

var lz4 = require('./lz4/lib/binding'), fs = require('fs'); var data = fs.readFileSync('packet-test.txt');

var compSize = lz.compressBound(data.length);