iipc / warc-specifications

Centralised repository for WARC usage specifications.
http://iipc.github.io/warc-specifications/
100 stars 30 forks source link

Less Restrictive warcinfo/metadata Formats #7

Open PsypherPunk opened 9 years ago

PsypherPunk commented 9 years ago

Currently the warcinfo record permits the following:

Allowable fields include, but are not limited to, all [DCMI] plus the following field definitions. All fields are optional.

The metadata type allows the similar, but subtly different, format:

Allowable fields include all [DCMI] plus the following field definitions. All fields are optional.

DCMI being the Dublin Core Metadata Intiative.

I'd like to suggest that the format be less restrictive. Specifically that the explicit DCMI references be dropped in favour of a mechanism for using any referenceable standard (similar to XML Namespaces, perhaps?). Perhaps more importantly, this might afford us the chance to require that any metadata format actually be defined somewhere.

For reference, see the wpull documentation (thanks @ikreymer) for the way they're using urn:X- URIs to reference external metadata formats.

ato commented 4 years ago

Someone asked about this issue on the IIPC Slack. For what it's worth the phrase "but are not limited to" reads to me that warcinfo fields are completely open to arbitrary data and not restricted at all.

Somewhat orthogonal to the namespacing question, @ibnesayeed suggested that if people end up using non-standard fields that might of use to others it might be worth adding them to a list like the one we started for header fields. That sounds sensible to me and widely adopted fields can later be incorporated into the standard. (If anyone wants to do it: send a pull request on this repo).