iipc / warc-specifications

Centralised repository for WARC usage specifications.
http://iipc.github.io/warc-specifications/
100 stars 30 forks source link

Question: Usage of LWS on writing #65

Closed blueeberry0o closed 4 years ago

blueeberry0o commented 4 years ago

In The WARC File Format under point 4 "File and record Model" in the 5th passage is written:

The field value may be preceded by any amount of linear white space (LWS), though a single space is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one space or tab character.

But there is no further explanation or description when it should be used while creating/writting a WARC-File, I also found no line capacity or max size definition. Does that mean this is only for WARC/1.0 convertion and should not be used or did I miss a point where this was further explained?

ato commented 4 years ago

I recommend not writing WARC files with line folding. Since there's no line length limit and there's no standard header fields that should reasonably contain multiline content I don't see any reason to use it. There's also the practical reason that implementations vary on whether the line-break or leading white space is interpreted as part of the field value.

Note also that this feature was copied from HTTP but was deprecated in the newer HTTP standards.

blueeberry0o commented 4 years ago

Thank you a lot @ato for your answer and the given RFC-Link! This perfectly answers the question.