iipc / warc-specifications

Centralised repository for WARC usage specifications.
http://iipc.github.io/warc-specifications/
100 stars 30 forks source link

Revisit example in section 10.6 uses message/http not application/http #55

Open ato opened 5 years ago

ato commented 5 years ago

A discussion between @ikreymer and @ibnesayeed discovered that in both WARC 1.0 and 1.1 the revisit record example uses message/http as the content-type whereas everywhere else in the standard application/http is used. This seems likely to be an oversight as draft 0.9 of the WARC standard used the message/http content-type everywhere but in draft 0.10 this was changed to application/http.

I'm unaware of what the reasoning at the time was for the change however RFC7230 has this to say about message/http:

The message/http type can be used to enclose a single HTTP request or response message, provided that it obeys the MIME restrictions for all "message" types regarding line length and encodings.

and application/http:

The application/http type can be used to enclose a pipeline of one or more HTTP request or response messages (not intermixed).

which leads to two reasonable arguments for preferring application/http over message/http:

  1. Arbitrary HTTP messages are not guaranteed to comply with MIME line length limits and indeed large cookies and location headers regularly violate them.
  2. In certain circumstances such as status 103 HTTP servers may respond with two response message. It seems reasonable to archive this situation as a single WARC response record containing two HTTP messages.

Proposed correction:

Change the revisit example in section 10.6 from:

Content-Type: message/http

to

Content-Type: application/http;msgtype=response

thus making it consistent with section 5.6 and the other examples.