iipc / warc-specifications

Centralised repository for WARC usage specifications.
http://iipc.github.io/warc-specifications/
98 stars 30 forks source link

'WARC-Identified-Payload-Type' allowed for request, revisit, continuation? #49

Open wumpus opened 5 years ago

wumpus commented 5 years ago

In the 1.1 spec, section 5.19, 'WARC-Identified-Payload-Type' is allowed for anything with a well-defined payload.

That makes sense for response, resource, and conversion.

That doesn't make sense for request, revisit, and continuation.

ato commented 5 years ago

It seems useful to allow it for requests as the software creating the warc file may want to identify the content type of the request payload. For example when JavaScript running in a browser constructs a mystery payload and is recorded by a tool like warcprox.

wumpus commented 5 years ago

Ah, yes, that's a good one. There are a lot of request json payloads out there with content-type text/plain. And a revisit would potentially have the same situation.

continuation records have a conflicting status. In clause 7, "Segments other than the first should not contain other optional fields" prohibits WARC-Identified-Payload-Type, and that conflicts with 5.19.