Some servers put optional space after the chunk-size which causes the following exception:
org.netpreserve.jwarc.ParsingException: chunked encoding at position 6944: ..."></span></a><ul class=dropdown-men\r\nD61<-- HERE --> \r\nu><li><a href="/mena/en/marketing/cor...
at org.netpreserve.jwarc.ChunkedBody.parse(ChunkedBody.java:203)
at org.netpreserve.jwarc.ChunkedBody.read(ChunkedBody.java:70)
Looks like the chunk-size is padded using blanks when it's shorter than 4 hex digits. Optional white space is not allowed by RFC 7230,
however, assuming that the server header correctly indicates "Apache-Coyote/1.1", I tried to figure out whether this is a systematic problem: the issue is discussed in https://bz.apache.org/bugzilla/show_bug.cgi?id=41364 and it turns out that RFC 2616 allows optional "linear white space" after the chunk-size, maybe also in other positions where it is not yet considered:
implied *LWS
The grammar described by this specification is word-based. Except
where noted otherwise, linear white space (LWS) can be included
between any two adjacent words (token or quoted-string), and
between adjacent words and separators, without changing the
interpretation of a field.
Some servers put optional space after the chunk-size which causes the following exception:
Captured using wget: http_chunked_3c.warc.gz
Looks like the chunk-size is padded using blanks when it's shorter than 4 hex digits. Optional white space is not allowed by RFC 7230, however, assuming that the server header correctly indicates "Apache-Coyote/1.1", I tried to figure out whether this is a systematic problem: the issue is discussed in https://bz.apache.org/bugzilla/show_bug.cgi?id=41364 and it turns out that RFC 2616 allows optional "linear white space" after the chunk-size, maybe also in other positions where it is not yet considered: