iipc / jwarc

Java library for reading and writing WARC files with a typed API
Apache License 2.0
48 stars 9 forks source link

WarcReader may hang up on clipped gzipped WARC file #17

Closed sebastian-nagel closed 4 years ago

sebastian-nagel commented 4 years ago

WarcReader may hang up when processing a gzipped WARC file with the last record clipped/incomplete (due to an unfinished download or a killed WARC writer). Seen with clipped.warc.gz, but I'll try to prepare a unit test which systematically checks for boundary conditions.