iipc / warc-specifications

Centralised repository for WARC usage specifications.
http://iipc.github.io/warc-specifications/
100 stars 30 forks source link

Section 8 in conflict with section 6 for warc-fields #50

Open wumpus opened 5 years ago

wumpus commented 5 years ago

In addition to these MIME types not being registered (#33), there is an inconsistency about whether the WARC-Type of warcinfo and metadata SHALL BE, MAY, or is recommended to be application/warc-fields (quoting from 1.1 standard)

 section 6.2 'warcinfo'
   The format of this descriptive record block may vary, though the use of the "application/warc-fields" content-type is recommended.
 section 6.6 'metadata'
   The "application/warc-fields" format may be used
 section 8
   The MIME type of warcinfo records, WARC metadata records, and potentially other records types in the future, shall be application/warc-fields.
JustAnotherArchivist commented 5 years ago

There exist at least two WARC-writing tools which use a different content type for warcinfo records: crocoite and qwarc both write JSON data with Content-Type: application/json; charset=utf-8.

The interpretation of warcinfo record contents is likely to be tool-dependent anyway since the fields are not really standardised (section 6.2 only gives some recommendations). Therefore, it doesn't make much sense to me to restrict the content type to the fairly inflexible WARC header field format.