oleg-st / ZstdSharp

Port of zstd compression library to c#
MIT License
200 stars 29 forks source link

Reusable `Compressor` & `Decompressor` in `Stream` #5

Closed CHeavyarms closed 2 years ago

CHeavyarms commented 2 years ago

Overview

To enable the reuse of Compressor and Decompressor instances between CompressionStream and DecompressionStream wrappers respectively, this adds an overloaded constructor to each of the Stream children which allows optionally providing an existing Zstandard context that will survive any IDisposable and IAsyncDisposable resource cleanup.

This is particularly useful when needing Stream semantics in conjunction with a Compressor or Decompressor instance that has a dictionary loaded or otherwise has a custom set of parameters applied.

oleg-st commented 2 years ago

Thank you @CHeavyarms

CHeavyarms commented 2 years ago

Thanks for the quick merge and for building this excellent C# port of Zstandard, @oleg-st!

Do you have a preferred process for publishing an update to the NuGet package version for which I could potentially contribute some automation as well?

oleg-st commented 2 years ago

Thank you for your interest, @CHeavyarms !

I don't have any special publisihing process, I just publish the package manually on the NuGet.

fabiang2249 commented 1 year ago

@CHeavyarms, @oleg-st I have problems reusing the Decompressor in a multithreaded environment. My use case is I want to reuse a dictionary (without having to call LoadDictionary()) in multiple DecompresionStreams that run in parallel, however I get the following error:

ZstdSharp.ZstdException: 'Corrupted block detected'

Code is as follows:

   public byte[] Decompress(MemoryStream stream)
        {
            byte[] decompressedBytes;

            using (var decompressorStream = new DecompressionStream(stream, MyDecompressor))
            {
                using (var decompressedStream = new MemoryStream())
                {
                    decompressorStream.CopyTo(decompressedStream);
                    decompressedBytes = decompressedStream.ToArray();
                }
            }

            return decompressedBytes;
        }

I have a basic Decompress method that receives a compressed stream and a static Decompressor "MyDecompressor" loaded with a dictionary and a compression level. If I avoid concurrency using a lock the issue goes away. If I initialize the DecompressionStream without passing a decompressor, issue also disappears.

@oleg-st, @CHeavyarms if this is not the way to share a dictionary, do you have some example of how to do that? Thanks in advance! F.G.

oleg-st commented 1 year ago

@fabiang2249 You need to use one Decompressor per thread. This limitation comes from the original ztd library (decompression context). It is possible to load dictionary by reference to avoid memory allocation and copying but unfortunately it is available only in unsafe API: ZstdSharp.Unsafe.Methods.ZSTD_DCtx_loadDictionary_byReference

fabiang2249 commented 1 year ago

@oleg-st thanks, so I should have some kind of pool of Decompressors, and some sync mechanism to access them? Isnt there a way to create a Decompressor using an existing decompressor context (at least copying the initialized dictionary).

Also in the link you provided there is a section about "Bulk processing dictionary API" which claims: "ZSTD_CDict can be created once and shared by multiple threads concurrently, since its usage is read-only."

Is that functionality implemented in this library?

oleg-st commented 1 year ago

@fabiang2249 You can use a thread pool with one Decompressor per thread.

Is that functionality implemented in this library?

It's implemented in usafe API only