guzba / zippy

Pure Nim implementation of deflate, zlib, gzip and zip.
MIT License
258 stars 31 forks source link

Size verification fails on ~4 GB (`< uint32`) `.tar.gz` file #68

Closed Vindaar closed 1 year ago

Vindaar commented 1 year ago

Hey!

First time finally using your library. Thanks for your work first of all. :)

While trying to see how it performs on some of my larger .tar.gz archives, I noticed that for one of the larger ones extraction via extractAll fails with a size verification error. The issue seems to be that the archive is larger than int32.high, due to the size check here:

https://github.com/guzba/zippy/blob/master/src/zippy/gzip.nim#L76

Given that isize is already a uint32 why do we mod the length of dst by (1 shl 31) and not allow the full uint32 range?

My archive has 476,932 files with a total size of 3,497,146,858 bytes (according to du . -b in the extracted directory).

I also wondered what would happen if we fully exceed the size of uint32, but with a quick try I wasn't able to build a .tar.gz archive that actually enters the uncompressGzip proc. Instead it seems to go here https://github.com/guzba/zippy/blob/master/src/zippy/tarballs.nim#L51 instead (I didn't actually check that, I only noticed that uncompressedGzip wasn't being run for my ~8 GB deflated archive).

Code to reproduce is pretty much just:

import zippy/tarballs
const tmp = "/tmp/foobar"
extractAll("/path/to/file.tar.gz", tmp)

I didn't open a PR, because I assume the line mentioned above is the way it is for a good reason. If it helps I can also send you the archive via google drive or w/e.

Thanks!

guzba commented 1 year ago

Looking quick it just seems like a bug to fix, should probably be (1 shl 32). https://www.rfc-editor.org/rfc/rfc1952.html is the boss and it says ISIZE This contains the size of the original (uncompressed) input data modulo 2^32.

Vindaar commented 1 year ago

Got it!

guzba commented 1 year ago

Thanks for reporting the issue and making a PR!