Open JustAnotherArchivist opened 3 years ago
This is sort of an edge case, and the whitespace was at one point used to indicate multi-line headers (which have now been deprecated, but warcio still supports). I'm not sure that the whitespace is significant anymore from a parsing perspective. Similar to #128, perhaps there could be a 'raw' mode flag that preserves the whitespace here if desired for when capturing HTTP traffic.
FWIW, I've never seen an HTTP server that returns a header like this, so (i hope) its not very common :)
The whitespace on the line with the field-name
has never been significant semantically as far as I know. Neither the whitespace after the colon nor the one at the end of the line is part of the actual field value content. And even with continuation lines: the optional whitespace at the end of a line, CRLF, and leading space/tab on the continuation line are overall equivalent to a single space.
But yeah, same as #128, this is about correctly preserving the data sent by the server, not the semantic meaning. I've suggested a possible solution there because they are indeed very similar and have essentially the same root cause.
Yeah, it is fortunately not very common, but I have seen it before, sadly enough. There are a lot of weird HTTP servers out there that operate at the edges of or beyond the specifications...
Expected output for the custom header (where
\t
is a literal tab):Actual output (only one space between the colon and the value, and the tab after the header is lost):