icsharpcode / SharpZipLib

#ziplib is a Zip, GZip, Tar and BZip2 library written entirely in C# for the .NET platform.
http://icsharpcode.github.io/SharpZipLib/
MIT License
3.7k stars 976 forks source link

GetNextEntryAsync might use too large buffer #845

Open Acilec opened 1 year ago

Acilec commented 1 year ago

Describe the bug

For TAR files GetNextEntryAsync tries to allocates a byte[] buffer of BlockSize 512, using .Rent():

https://github.com/icsharpcode/SharpZipLib/blob/ff2d7c30bdb2474d507f001bc555405e9f02a0bb/src/ICSharpCode.SharpZipLib/Tar/TarInputStream.cs#L548-L556

(also at: https://github.com/icsharpcode/SharpZipLib/blob/ff2d7c30bdb2474d507f001bc555405e9f02a0bb/src/ICSharpCode.SharpZipLib/Tar/TarInputStream.cs#L328-L333)

But .Rent() returns at least the requested size.

After this, in TarBuffer.ReadBlockIntAsync() checkes if buffer size is exactly the same as the requested BlockSize.

if (buffer.Length != BlockSize)
{
    throw new ArgumentException("BUG: buffer must have length BlockSize");
}

Sometimes headerBuf becomes larger than the requested TarBuffer.BlockSize (in my case 1024) and the ArgumentException("BUG: buffer must have length BlockSize") is thrown.

Possible fix:

if (buffer.Length < BlockSize)
{
    throw new ArgumentException("BUG: buffer must have length BlockSize");
}

Reproduction Code

No response

Steps to reproduce

N/A

Expected behavior

No exception for larger buffers

Operating System

No response

Framework Version

No response

Tags

No response

Additional context

No response

menees commented 9 months ago

The suggested possible fix of checking if (buffer.Length < BlockSize) won't be sufficient because several things downstream assume that the headerBuf array will be exactly 512 bytes. For example, TarHeader.ParseBuffer is passed a byte[] headerBuf, and it calls TarHeader.MakeCheckSum at the end, which iterates through buffer.Length. So, if headerBuf is larger than 512 bytes, then the checksum will be wrong.

I arrived here thinking that this bug may be affecting me too, and it shows up looking like #160:

ICSharpCode.SharpZipLib.Tar.TarException: Header checksum is invalid
Stack Trace:
  at ICSharpCode.SharpZipLib.Tar.TarInputStream.GetNextEntryAsync(CancellationToken ct, Boolean isAsync)
  at ICSharpCode.SharpZipLib.Tar.TarInputStream.GetNextEntry()

My development team has been sporadically seeing the "Header checksum is invalid" error in one unit test on the same zip23x-libc5.tar.gz input test file for the last month since we upgraded to SharpZipLib v1.4.2. We never saw this with v1.3.3 for the year and a half we were on that version. I've confirmed that we're all using the same version of the input file, and those bytes haven't changed since April 3, 2012.

I'm not sure how we're encountering this since the check in tarBuffer.ReadBlockIntAsync should come first. But the sporadic nature of it across multiple machines where each developer has their own local copy of the input zip23x-libc5.tar.gz file makes me think that the ArrayPool.Rent behavior could explain its randomness. I'm going to go back to v1.3.3 to see if stability returns to our test.

menees commented 8 months ago

FWIW, after going back to v1.3.3, our test has been stable again for over a month, and no one has seen the "Header checksum is invalid" error again.

marcio-af-oliveira commented 4 months ago

I'm experiencing the same issue, like @menees said, after downgrading to v1.3.3, the "Header checksum is invalid" error is fixed.