icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.73k stars 976 forks source link

Allow setting buffer sizes in TarOutputStream for higher write performance #219

Open IvanAntipov opened 6 years ago

IvanAntipov commented 6 years ago

Is use FileStream with FileOptions.WriteThrough option in order to reduce OS memory pressure.

using(var fileStream = new FileStreamWrapper(tmpFileName, FileMode.Create, FileAccess.Write, FileShare.None, 8*1048576, FileOptions.WriteThrough))            
using (var gzoStream = new GZipOutputStream(fileStream , 8*1048576))
using (var tarArchive = new TarOutputStream(gzoStream ))      {

                tarEntry.Size = data.Length;
                tarArchive.PutNextEntry(tarEntry);

                data.CopyTo(tarArchive , 8*1048576);
                tarArchive.CloseEntry();     
}

So, File.Write and File.Flush is not buffered/cached on OS (windows) level.

The problem is: TarOutputStream call TarBuffer.WriteBlock->TarBuffer.WriteRecord(for each block)->outputStream.Flush.

This leads to multimple write/flush file operations ~2KB each.

Write speed is very low. 10 times lower in comparison with if if comperss tar.gz in memory stream and then write it to disk.

P.S. BufferedStream noes not helps in this situation

piksel commented 3 years ago

I have added a toggle for the flushing to outputbuffer and added a test that runs using FileOptions.WriteThrough. With the SkipFlushOnEveryBlock flag turned off:

Time 00:14.791 throughput 127.03 MB/s (using test size: 1879.05 MB)

With the SkipFlushOnEveryBlock flag turned on:

Time 00:01.789 throughput 1050.14 MB/s (using test size: 1879.05 MB)

This is basically the numbers you experienced as well, but the question is, what do you actually gain by buffering the data in .NET instead of letting the OS buffer it? It seems like it would be a lot less efficient. That being said, I don't really get why the TarBuffer were flushing excessivly after every write in the first place, so having a flag to turn it off should at least be an option.