iipc / jwarc

Java library for reading and writing WARC files with a typed API
Apache License 2.0
46 stars 8 forks source link

GunzipChannel fails on payload with uncompressed size exceeding int_max #54

Closed sebastian-nagel closed 3 years ago

sebastian-nagel commented 3 years ago

A gzip-compressed payload with an uncompressed size exceed 2^31-1 (max. value of a 32-bit integer) causes the GunzipChannel to fail with the following exception:

$> java -cp target/jwarc-0.13.1-SNAPSHOT.jar org.netpreserve.jwarc.tools.WarcTool extract --payload test-size-int-max-overflow-content-encoding-gzip.warc.gz 975
Exception in thread "main" java.util.zip.ZipException: gzip uncompressed size mismatch
        at org.netpreserve.jwarc.GunzipChannel.readTrailer(GunzipChannel.java:92)
        at org.netpreserve.jwarc.GunzipChannel.read(GunzipChannel.java:70)
        at org.netpreserve.jwarc.tools.ExtractTool.writeBody(ExtractTool.java:81)
        at org.netpreserve.jwarc.tools.ExtractTool.writePayload(ExtractTool.java:70)
        at org.netpreserve.jwarc.tools.ExtractTool.main(ExtractTool.java:156)
        at org.netpreserve.jwarc.tools.WarcTool.main(WarcTool.java:21)

The WARC file test-size-int-max-overflow-content-encoding-gzip.warc.gz (21 kB) contains one record with a payload size of 2^31.