icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.72k stars 979 forks source link

BZip2.Compress Never Finishes #218

Open HaydnTrigg opened 6 years ago

HaydnTrigg commented 6 years ago

Steps to reproduce

  1. Load the provided file (this is the pixel data from a PNG image, original provided)
  2. Load the byte data through a memory stream into BZip2.Compress function

Code Sample:

byte[] blob_data;
using (FileStream fs = new FileStream("../../_GameMeshTri1_color.blob", FileMode.Open))
{
    blob_data = new byte[fs.Length];
    fs.Read(blob_data, 0, (int)fs.Length);
}
Console.Write($"Input size {SizeSuffix(blob_data.Length)}");

byte[] storage_data;
using (MemoryStream sourceStream = new MemoryStream(blob_data))
using (MemoryStream compressStream = new MemoryStream())
{
    BZip2.Compress(sourceStream, compressStream, true, 4096);
    storage_data = compressStream.ToArray();
}
Console.Write($"Compressed size {SizeSuffix(storage_data.Length)}");

Repo with project: https://github.com/HaydnTrigg/SharpZipLibProblem _GameMeshTri1_color.blob: _GameMeshTri1_color.blob _GameMeshTri1_color.png: _GameMeshTri1_color.png

Expected behaviour

BZip2 should compress the file into the output stream "compressStream"

Actual behaviour

BZip2.Compress function never returns The code has been tested with other generated binary data, this just suddenly caused a hang with this specific bit map data.

Version of SharpZipLib

v1.0.0-alpha2 from NuGet

Obtained from (place an x between the brackets for all that apply)

piksel commented 6 years ago

You should probably not use a block size of ~400MB

BZip2.Compress(sourceStream, compressStream, true, 4096);

Giving a block size above 9 should probably throw an exception though. With a block size of 1 it still takes ~15 min to compress your sample file, which is pretty terrible, I'd probably suggest compressing using an external process like 7z as it's way more performant:

> ptime 7z a -tbzip2  _GameMeshTri1_color.blob-7zr.bz2 _GameMeshTri1_color.blob -mx=9
7-Zip [64] 16.04 : Copyright (c) 1999-2016 Igor Pavlov : 2016-10-04

Scanning the drive:
1 file, 67108900 bytes (65 MiB)

Creating archive: _GameMeshTri1_color.blob-7zr.bz2

Items to compress: 1

Files read from disk: 1
Archive size: 731848 bytes (715 KiB)
Everything is Ok

Execution time: 20.579 s

There might be a bug here though, could you provide another blob that is similar but compresses faster?

piksel commented 6 years ago

I am currently working on replacing the BZip-code (https://github.com/piksel/SharpZipLib/tree/jbzip2) and using that branch it compresses your blob in about 12s on the same machine. It actually passes the tests, but it is not ready for production yet: https://i.imgur.com/9OpNONx.png

HaydnTrigg commented 6 years ago

I am unable to reproduce this error with any other files, however, its just specifically that file that causes this error. It's interesting to note that in BZip2OutputStream the block size should be within 1-9 range anyway just in case an invalid input is put in.

if (blockSize > 9) {
    blockSize = 9;
}

if (blockSize < 1) {
    blockSize = 1;
}
piksel commented 6 years ago

Yes, the blockSize was just a red herring.

I just wanted a working sample to use as a comparison when debugging the code.