This issue only affects crawls using the OutbackCDXClient bean.
The ExtendedWARCWriterProcessor in ukwa-heritrix sets the metadata property warcFileRecordLength, however in OutbackCDXClient attempts to read the property as warcRecordLength.
This results in all CDX lines containing a zero-value length field.
This fix changes OutbackCDXClient to instead read warcFileRecordLength, resulting in correct CDX length values.
Side note: If using OutbackCDXClient, you must also use ExtendedWARCWriterProcessor, otherwise there will be no warcFileRecordLength to reference.
This issue only affects crawls using the
OutbackCDXClient
bean.The
ExtendedWARCWriterProcessor
in ukwa-heritrix sets the metadata propertywarcFileRecordLength
, however inOutbackCDXClient
attempts to read the property aswarcRecordLength
.This results in all CDX lines containing a zero-value length field.
This fix changes
OutbackCDXClient
to instead readwarcFileRecordLength
, resulting in correct CDX length values.Side note: If using
OutbackCDXClient
, you must also useExtendedWARCWriterProcessor
, otherwise there will be nowarcFileRecordLength
to reference.