And look at warc-recompress.warc you will see that the warcinfo record now has WARC-Block-Digest with a SHA1 hash. (I included a copy of warc-recompress.warc in the ZIP).
While I suppose more digests aren't a bad thing:
I would not expect a recompression operation to alter the records in the WARC.
This behavior isn't documented
It (very slightly) increases the size of the WARC
My suggestion would be that warcio recompress should not alter the records of the WARC it is operating on.
It appears that
warcio recompress
will addWARC-Block-Digest
fields to records that do not already have that field.In the ZIP there are 2 warcs. example-warcs.zip
In
orig.warc
thewarcinfo
record at the start does not have aWARC-Block-Digest
field at all. However if you run:And look at
warc-recompress.warc
you will see that thewarcinfo
record now hasWARC-Block-Digest
with a SHA1 hash. (I included a copy ofwarc-recompress.warc
in the ZIP).While I suppose more digests aren't a bad thing:
My suggestion would be that
warcio recompress
should not alter the records of the WARC it is operating on.