mattosaurus / PgpCore

.NET Core class library for using PGP
MIT License
240 stars 100 forks source link

DecryptSteam causes memory bloat when working with in-memory pipelines #116

Open aboone-fusion opened 3 years ago

aboone-fusion commented 3 years ago

All versions of DecryptSteam end up calling Stream.PipeAll which reads the entire input and writes it to the output stream immediately. That works fine when the output stream is intended to immediately accept the entire data streams payload (i.e a filestream for write), but either creates memory pressure for in-memory (pipeline) use cases or requires intermediate I/O to flush the result then open the result as a new stream. (i.e. DecryptStream is more akin to CopyTo, then the traditional case of a stream constructor)

My use case was files (filename.csv.gz.pgp) that are decrypted, decompressed, and then parsed for content.

To work around, I copied a DecryptStream implementation, but returned a BufferedStream wrapping Ld.GetInputStream() and a managed Dispose(), leaving out the PipeAll.

The output stream could then be passed to another stream constructor (e.g. GZipStream) and would not load the entire file in memory at any time. This approach consumes around 50k memory for large files during the input filestream's life (my usage case was 700mb files). And fits well with existing platforms/streams.

workflow example for a function returning an IEnumerable of data records, without all the dispose/using context Stream pipe = new System.IO.FileStream(path, FileMode.Open); pipe = new PgpCoreWrapper.DecryptionStream(pipe, keys, password); pipe = new System.IO.Compression.GZipStream(pipe, CompressionMode.Decompress); pipe = new System.IO.StreamReader(pipe); csv = new CsvHelper.CsvReader(pipe, config); ...parse csv data... yield return new record(data)

mattosaurus commented 3 years ago

Nice, I tried to reduce memory usage previously by tweaking the stream usage but obviously wasn't completly successful.

Is this something you'd be able to do and submit a PR for?

Or if not provide a full worked example that I can base my own PR off of.

aboone-fusion commented 3 years ago

I put in a PR, for an example/some structure. Its only the one decrypt workflow. More work is needed to be fully featured, but there is framework to implement other workflows. The only tricky part, I think, is handling the IsIntegrityProtected logic after the stream has been fully read. I've left behind a a question of if Stream.CopyTo internally calls Stream.Read, or bypasses it. I've assumed Read is bypassed, but if not, then it is calling the integrity check twice.

mattosaurus commented 3 years ago

Thanks for that, I'll take a look at this when I get a chance and hopefully update PgpCore to be a bit more efficent.