iipc / jwarc

Java library for reading and writing WARC files with a typed API
Apache License 2.0
46 stars 8 forks source link

GzipChannel write() returns compressed length rather than buffer consumption #44

Closed ato closed 4 years ago

ato commented 4 years ago

GzipChannel.write() returns the number of compressed bytes written to the underlying channel rather than the number of uncompressed bytes consumed from the buffer. While this is useful information to know it unfortunately is not what the WritableByteChannel interface intends when it refers to "bytes written" and confuses standard methods that operate on channels such as FileChannel.transferTo().

sebastian-nagel commented 4 years ago

Good point! Would require to add a field to track the position in the output, supposed that's what WarcWriter should track as "position". The position in the compressed WARC file is required if CDX records are created while writing the WARC.