mstade / markette

Deliciously minimalistic markup.
MIT License
1 stars 0 forks source link

Metadata and/or preface #1

Open mstade opened 10 years ago

mstade commented 10 years ago

It's common for documents – particularly larger ones, such as books – to include metadata. Such metadata might include a list of authors, date published, addresses of different kinds (URLs, email, physical) etc. However, even smaller documents often include metadata. Consider something like a project description, i.e. README, which may contain links to licenses or other projects; an example of this would be RFCs.

What might be a good way to represent this in a structured manner such that it's easily extracted, yet not restrictive in authoring. Some considerations to make:

mstade commented 10 years ago

While this issue started out discussing whole-document metadata, there are plenty of cases where it will be useful to include metadata for specific sections or even primitive parts of a document. Consider the following example:

Out-bound navigational links are links that point to a different document altogether;
they link "out" of the document. Links of this type have an LO [H-factor].

[H-factor]: <http://amundsen.com/hypermedia/hfactor/>; rel: describedby

While the syntax here isn't finalized by any means, it does highlight the case of in-document metadata. The link has the optional rel attribute specified, to describe the link relation. Such metadata is really important, and will play a huge role in implementing #5.

The point of this comment is to highlight that metadata can apply to either the whole document, or specific parts of it. These are two distinct features, both equally important.

mstade commented 10 years ago

I'm leaning towards there being two kinds of metadata, just like in HTML where there's a <head> element to include things such as <meta> elements, and microdata (the whole itemscope thing.)

How to demarcate document metadata from other content is up for debate however. There are a few examples of prior art that have different takes on the matter:

Evidently, the key/value pair serialization seems popular but I'm not convinced this is not just a special case. It may be likely that a metadata block will more or less always be more useful to machine readers rather than humans, but I'm not sure. I think it'd probably alienate a bunch of non-technical users if the document metadata block is YAML or some other machine friendly format, because even if it's fairly readable it lacks that beautiful sense of structured chaos you can experience with truly free content.

I think the sweet spot is some sort of semantics where within the metadata block, content can be anything but some content has special meaning. This could be for instance describing the key/value pair semantics while still allowing things such as lists to make an appearance. Possibly it'd be useful to exclude some content from the metadata block, such as headings, but I'm far from sure about this. My gut tells me to just regard the metadata block as a "document within the document" and that anything goes.

Regardless of the format of the content within the metadata block, I think the block itself should be clearly demarcated. I don't like the idea of simply saying that if the document starts with a paragraph, it's metadata. I think that's too ambiguous. What if I genuinely just wanted to start my document with a simple paragraph? What if my document is a haiku? No, I think I prefer the idea Jekyll has with it's YAML Front-matter – clearly demarcated metadata through the use of a triple-dash start and a triple-dash end. The specific syntax can be bike shedded, but I like the idea of making it explicit. It reduces the risk of ambiguity while also providing clear visual clues of which part is which.