Other implementations appear to ignore this error. Perhaps by simply skipping arbitrary numbers of CR and LF characters before reading the next record?
I don't want to silently ignore this but perhaps we could log a warning and attempt to continue.
+1 to skip trailing empty lines, cf. warcio's archiveiterator.py. With per-record compressed WARC files the Content-Length is not really required for reading, it's more a validation feature, same as the digests.
Some versions of wget generated WARC headers with an off by one Content-Length. This causes us to throw:
Examples:
Other implementations appear to ignore this error. Perhaps by simply skipping arbitrary numbers of CR and LF characters before reading the next record?
I don't want to silently ignore this but perhaps we could log a warning and attempt to continue.