Closed ato closed 6 years ago
Makes sense to me.
I wonder if it is appropriate to include some kind of "errata" as well to address how this was mishandled in the previous standard?
I added a Document History section with this kind of thing in mind, but maybe a dedicated Errata bit would be better?
My experience in this area is very limited, but in most of the standards I have read the errata is a separate document associated with the version containing the error. eg #25
Revisions I've seen note changes if there are compatibility concerns in a "Changes since 1.0" section or just inline where the relevant item is discussed. For example:
In version 1.0 of the WARC standard the
uri
grammar rule was defined incorrectly with respect to the examples in the specification and with common implementations. For compatiblity implementations may choose to accept but should never emit URIs surrounded by '<' and '>' in the WARC-Target-URL and WARC-Profile fields.
@anjackson, should I add a document history entry to this pull request? I'd be happy to do so. I wasn't sure if it would cause problems when merging and whether the date should refer to now or the date of merging.
The following changes have been integrated in the revised ISO draft during the ISO working group meeting on November 16-17, 2015:
in section 4 file and record model, change the definition of uri and add a note: uri = <'URI' per RFC3986>
NOTE: in WARC 1.0 standard (ISO 28500:2009), uri was defined as "<" <'URI' per RFC3986> ">". This rule has been changed to meet requests from implementers.
Included in WARC 1.1
In the examples and in all popular implementations, URIs in the WARC-Target-URL and WARC-Profile fields are not surrounded by "<" and ">" characters. This change makes the grammar consistent with practice by removing "<" and ">" from the basic
uri
rule and introducing a newrecord-id
rule for the fields WARC-Record-ID, WARC-Concurrent-To, WARC-Refers-To, WARC-Warcinfo-ID and WARC-Segment-Origin-ID.Fixes #23