How to reproduce:
Run the test case testWriteCloseWriteCloseRead(). In pseudo code:
Write some data to a file with a LZ4BlockOutputStream and close the stream
Write some more data to the same file with a new LZ4BlockOutputStream and close the stream.
Read the sum of the data with one single instance of LZ4BlockInputStream
/**
* Write and close two stream instances to the same file. Read the entire data with one
* LZ4BlockInputStream.
*/
@Test
public void testWriteCloseWriteCloseRead() throws IOException {
final byte[] testBytes = "Testing!".getBytes(Charset.forName("UTF-8"));
//Write the first time
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
LZ4BlockOutputStream out = new LZ4BlockOutputStream(bytes);
out.write(testBytes);
out.close();
//Write the second time
out = new LZ4BlockOutputStream(bytes);
out.write(testBytes);
out.close();
ByteArrayInputStream in = new ByteArrayInputStream(bytes.toByteArray());
LZ4BlockInputStream lz4In = new LZ4BlockInputStream(in);
DataInputStream dataIn = new DataInputStream(lz4In);
byte[] buffer = new byte[testBytes.length];
dataIn.readFully(buffer);
assertArrayEquals(testBytes, buffer);
// in.skip(LZ4BlockOutputStream.HEADER_LENGTH); //This test case can only be passed if 21 bytes (the footer) is skipped
buffer = new byte[testBytes.length];
dataIn.readFully(buffer);
assertArrayEquals(testBytes, buffer);
}
Actual:
An java.io.EOFException is thrown
Expected:
The sum of the data should be read and returned.
Analysis:
The LZ4BlockOutputStream will write a header, data and a footer. The footer is very similar to the header. Two LZ4BlockOutputStreams will create this:
Header | Compressed Data | Footer | Header |Compressed Data | Footer
One instance of LZ4BlockInputStream will read the header and the compressed data. If the user tries to read more data it will try to read a header again. But since it has not skipped the previous footer it will read the footer instead. The footer, although similar to the header contains a 0 length and will therefore return -1 from the read() method and the DataInputStream will thus throw a EOFException.
If the user manually skips 21 bytes (the length of the header/footer) the LZ4BlockInputStream will happily continue to read another “frame” (se the out-commeted row in the test case).
Workaround:
The user can manually call in.skip(21).
Suggested fix:
I think it would be appropriate if a LZ4BlockInputStream consumes all bytes related the one frame: that is the footer should be consumed when the end of the frame has been reached
I’m guessing the solution might be a bit trickier because the footer is related to the frame and the header to the block? (I’m probably using the term block and frame wrong)
Another approach would be to just say that this should not be possible. But this “feature” works with a normal GZIPOutputStream/GZIPInputStream so it would be good if it also works with LZ4.
How to reproduce: Run the test case testWriteCloseWriteCloseRead(). In pseudo code:
Actual: An java.io.EOFException is thrown
Expected: The sum of the data should be read and returned.
Analysis: The LZ4BlockOutputStream will write a header, data and a footer. The footer is very similar to the header. Two LZ4BlockOutputStreams will create this: Header | Compressed Data | Footer | Header |Compressed Data | Footer One instance of LZ4BlockInputStream will read the header and the compressed data. If the user tries to read more data it will try to read a header again. But since it has not skipped the previous footer it will read the footer instead. The footer, although similar to the header contains a 0 length and will therefore return -1 from the read() method and the DataInputStream will thus throw a EOFException.
If the user manually skips 21 bytes (the length of the header/footer) the LZ4BlockInputStream will happily continue to read another “frame” (se the out-commeted row in the test case).
Workaround: The user can manually call in.skip(21).
Suggested fix: I think it would be appropriate if a LZ4BlockInputStream consumes all bytes related the one frame: that is the footer should be consumed when the end of the frame has been reached
I’m guessing the solution might be a bit trickier because the footer is related to the frame and the header to the block? (I’m probably using the term block and frame wrong)
Another approach would be to just say that this should not be possible. But this “feature” works with a normal GZIPOutputStream/GZIPInputStream so it would be good if it also works with LZ4.