icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.7k stars 976 forks source link

Async entry read causes exception when reading AES encrypted zip. #823

Open rilehudd opened 1 year ago

rilehudd commented 1 year ago

Describe the bug

When trying to read an encrypted archive's entry's input stream asynchronously, it reports an exception containing message: "Auth code missing from input stream".

Yet, if the same archive is read synchronously it works fine.

Reproduction Code

https://dotnetfiddle.net/RhJM7o

Steps to reproduce

It doesn't seem to happen for every input. But for inputs where it does happen, changing the async call to the synchronous version fixes the problem.

The zip input is in the dotnetfiddle.

The steps to create the zip archive from 7-Zip are:

  1. Using the file provided (appendix A)

  2. 7-Zip -> Add to archive

  3. Ensure params:

    • Archive Format: Zip
    • Compression level: 5 - Normal
    • Compression method: * Deflate
    • Encryption method: AES-256
    • Set password (I set mine to "test")
  4. Run the decompress/decrypt steps outlined in the dotnetfiddle

Expected behavior

For the same input, I expected them both to work (the synchronous and the asynchronous).

Operating System

Windows

Framework Version

.NET 6

Tags

ZIP, Encryption, Async

Additional context

No response

piksel commented 1 year ago

Thanks for the report and for actually using the dotnet fiddle template! I compacted it somewhat to emphasise that the only difference is the async/sync calls: https://dotnetfiddle.net/5alzjK

ArchiveDiag report for the file (nothing stands out afaikt): https://archivediag.piksel.se/reports/asyncaes.zip-1681140301148006.html

I have not found I cause yet, but I will look into this as soon I can.

piksel commented 1 year ago

Yeah, this is yet another time that the framework bypasses the public methods of CryptoStream for increased performance. ZipAESStream overrides the Read and ReadAsync methods of CryptoStream, but as you can see from the

Stacktrace ``` at ICSharpCode.SharpZipLib.Encryption.ZipAESTransform.TransformFinalBlock(Byte[] inputBuffer, Int32 inputOffset, Int32 inputCount) in /_/src/ICSharpCode.SharpZipLib/Encryption/ZipAESTransform.cs:line 141 at System.Security.Cryptography.CryptoStream.ReadAsyncCore(Memory`1 buffer, CancellationToken cancellationToken, Boolean useAsync) at System.Security.Cryptography.CryptoStream.ReadAsyncInternal(Memory`1 buffer, CancellationToken cancellationToken) at System.IO.StreamReader.ReadBufferAsync(CancellationToken cancellationToken) at System.IO.StreamReader.ReadToEndAsyncInternal() at Program.<
$>g__Repro|0_0(String type) at Program.<
$>g__TestAsync|0_2[T](Func`2 a, T input) ```

, StreamReader just bypasses ReadAsync and calls ReadAsyncCore directly, which bypasses our read logic. That method is internal and cannot be overridden by us.

This is essentially #572 all over again, but with no clear solution.

piksel commented 1 year ago

As for consumers (including @rilehudd), you can replace the StreamReader with:

var entry = zip.GetEntry("abcdefgh.txt");
var entryStream = zip.GetInputStream(entry);
var entryBuffer = new byte[entry.Size];
using (var ms2 = new MemoryStream(entryBuffer))
{
    byte[] buffer = new byte[81920];
    int read;
    while ((read = await entryStream.ReadAsync(buffer, 0, buffer.Length)) > 0)
    {
        ms2.Write(buffer, 0, read);
    }

    var output = Encoding.UTF8.GetString(entryBuffer);

      // ... use output ...
}

It's not ideal, for sure, but that is the only async path that avoids the optimisations in .NET that bypasses the AES encryption logic.

rilehudd commented 1 year ago

Is there a way we can detect this "bypassed" state an compensate for this optimization? For example detect that ZipAESTransform.TransformFinalBlock has been called from the non-expected path (to implement what CryptoStream is expecting). I'm may not be understanding the feasibility of attempting to compensate for the optimization. Or maybe it is possible but would result in unacceptable complexity. (I wonder what dotnet designers expected implementors to do with them bypassing public methods).

Otherwise, it seems like this might be an optimization that might likely affect many encryption stream implementations. Do you think it makes sense to submit this optimization bug upstream to dotnet?

rilehudd commented 1 year ago

Sorry didn't intend to close this issue (wanted to ensure you see this and have an opportunity to respond so I might see what your thoughts are).

piksel commented 1 year ago

I think the main issue is that extending CryptoStream is an unintended way to implement decryption and I don't know why it was done this way. It is currently unrecoverable, since part of the auth code gets consumed by TransformBlock. The ZipAESStream uses a sliding buffer for this that should be possible to move over to the transform.

rilehudd commented 1 year ago

Thanks for taking the time to take a look at this!

piksel commented 1 year ago

There is another solution to this. The compression method that is used for the entry is STORE, which means that it's uncompressed. This problem only happens to such entries because we directly return the ZipAESStream to the consumer. If there is some decompression to be done, the stream will be wrapped in the appropriate decompression stream, and the optimized path is not taken. We could add a StoreOutputStream that just passes the data through, but in doing so mitigates this problem.