(or ‘application/http; msgtype=request’ and ‘application/http; msgtype=response’ respectively)
Note the space after the semicolon. However the grammar immediately following this prose disallows spaces in this position. It only allows them in a parameter value when enclosed in a quoted-string.
Future revisions / errata of the WARC standard should make the same grammar correction.
Note that Heritrix writes the Content-Type header for http requests and responses with spaces so a very large number of WARCs in the wild require this grammar change in order to be successfully parsed.
From WARC 1.1 section 5.6:
Note the space after the semicolon. However the grammar immediately following this prose disallows spaces in this position. It only allows them in a parameter value when enclosed in a quoted-string.
It appears revised HTTP standards have addressed this problem as the grammar in RFC 7231 explicitly allows optional white space in this position:
Where OWS is defined in RFC 72301:
Future revisions / errata of the WARC standard should make the same grammar correction.
Note that Heritrix writes the Content-Type header for http requests and responses with spaces so a very large number of WARCs in the wild require this grammar change in order to be successfully parsed.