lz4 / lz4-java

LZ4 compression for Java
Apache License 2.0
1.09k stars 248 forks source link

Data larger than 4194304 Bytes is not decompressed correctly #203

Open mgrundie-r7 opened 1 year ago

mgrundie-r7 commented 1 year ago
    Random random = new Random();
    data = new byte[4194304];
    random.nextBytes(data);

    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

    LZ4FrameOutputStream lz4FrameOutputStream = new LZ4FrameOutputStream(byteArrayOutputStream);
    lz4FrameOutputStream.write(data);
    lz4FrameOutputStream.close();

    byte[] compressedBytes = byteArrayOutputStream.toByteArray();
    LZ4FrameInputStream compressedInputStream = new LZ4FrameInputStream(new ByteArrayInputStream(compressedBytes));

    byte[] decompressedData = new byte[data.length];
    compressedInputStream.read(decompressedData);
    compressedInputStream.close();

    assertTrue(Arrays.equals(data, decompressedData));

The above assertion passes. If you update the code to data = new byte[4194305]; then the assertion fails and the 4194305th element of decompressedData (decompressedData[4194304]) will equal 0

petoncle commented 1 year ago

That's because you are decompressing only one block (you should call compressedInputStream .read() more than once).
Instead of:

byte[] decompressedData = new byte[data.length];
compressedInputStream.read(decompressedData);

Try:

byte[] decompressedData = compressedInputStream.readAllBytes();
mgrundie-r7 commented 1 year ago

I can't control what the callers do and the callers have no specific knowledge which type of InputStream this is. I ended up writing an adapter that extends InputStream and delegates to LZ4FrameInputStream to add some reset capability and to chunk read when read is asked to read > 4194304