Metadata and/or preface

mstade commented 10 years ago

It's common for documents – particularly larger ones, such as books – to include metadata. Such metadata might include a list of authors, date published, addresses of different kinds (URLs, email, physical) etc. However, even smaller documents often include metadata. Consider something like a project description, i.e. README, which may contain links to licenses or other projects; an example of this would be RFCs.

What might be a good way to represent this in a structured manner such that it's easily extracted, yet not restrictive in authoring. Some considerations to make:

Should be possible to read before any other part of a document; particularly important to streaming parsers where the metadata block might contain functionality modifiers (e.g. profile links.)
- However, some content (such as README files) may actually be less accessible if this is a hard requirement (i.e. a required initial block if too large might deter a reader from the meat of the document.)
Might include multiple disparate types of metadata, yet still grouped because they are in fact metadata; thus it might be necessary to allow any type of content, not just simplify to key/value pairs.
- Key/value pairs are common for technical data which may or may not be important for rendering, but is important for machine consumption. This would allow accurate representation of HTTP messages, for instance. It might be useful to consider this a common special case.

mstade commented 10 years ago

While this issue started out discussing whole-document metadata, there are plenty of cases where it will be useful to include metadata for specific sections or even primitive parts of a document. Consider the following example:

Out-bound navigational links are links that point to a different document altogether;
they link "out" of the document. Links of this type have an LO [H-factor].

[H-factor]: <http://amundsen.com/hypermedia/hfactor/>; rel: describedby

While the syntax here isn't finalized by any means, it does highlight the case of in-document metadata. The link has the optional rel attribute specified, to describe the link relation. Such metadata is really important, and will play a huge role in implementing #5.

The point of this comment is to highlight that metadata can apply to either the whole document, or specific parts of it. These are two distinct features, both equally important.

mstade commented 10 years ago

I'm leaning towards there being two kinds of metadata, just like in HTML where there's a <head> element to include things such as <meta> elements, and microdata (the whole itemscope thing.)

How to demarcate document metadata from other content is up for debate however. There are a few examples of prior art that have different takes on the matter:

Evidently, the key/value pair serialization seems popular but I'm not convinced this is not just a special case. It may be likely that a metadata block will more or less always be more useful to machine readers rather than humans, but I'm not sure. I think it'd probably alienate a bunch of non-technical users if the document metadata block is YAML or some other machine friendly format, because even if it's fairly readable it lacks that beautiful sense of structured chaos you can experience with truly free content.

I think the sweet spot is some sort of semantics where within the metadata block, content can be anything but some content has special meaning. This could be for instance describing the key/value pair semantics while still allowing things such as lists to make an appearance. Possibly it'd be useful to exclude some content from the metadata block, such as headings, but I'm far from sure about this. My gut tells me to just regard the metadata block as a "document within the document" and that anything goes.

Regardless of the format of the content within the metadata block, I think the block itself should be clearly demarcated. I don't like the idea of simply saying that if the document starts with a paragraph, it's metadata. I think that's too ambiguous. What if I genuinely just wanted to start my document with a simple paragraph? What if my document is a haiku? No, I think I prefer the idea Jekyll has with it's YAML Front-matter – clearly demarcated metadata through the use of a triple-dash start and a triple-dash end. The specific syntax can be bike shedded, but I like the idea of making it explicit. It reduces the risk of ambiguity while also providing clear visual clues of which part is which.

mstade / markette

Metadata and/or preface #1