iilab / contentascode

Content as Code
http://iilab.github.io/contentascode
GNU General Public License v3.0
34 stars 7 forks source link

Transclusion - Block level metadata in Markdown - i.e. middlematter #12

Open jmatsushita opened 8 years ago

jmatsushita commented 8 years ago

Especially in content reuse scenarios, keeping metadata (in particular Provenance metadata) is key to keep track of upstream changes in complex aggregated content pipelines.

In addition to (awesome) approaches such as @elationfoundation which generates JSON-LD from Jekyll Frontmatter documents we will surely need to manage metadata for more granular blocks of content for instance in scenarios where larger documents are made of smaller parts.

There is a scenario where the content editor aggregates smaller files with their metadata each, but it seems to break the model of having Markdown fit the view/perspective of the Author rather than bending to technical requirements.

From a usability standpoint, the ability to add "chapters" or "sections" in this way without creating folders, subfolders and small files is important to consider.

When I think about this it seems to lead to the possibility to add block level metadata in markdown. Given that adding metadata at the top is a well adopted practice across static website generators and called a frontmatter, the idea to add metadata in the middle, instead of the front called therefore be called middlematter.

I'm thinking for instance of this type of syntax (with YAML single line maps) :

1.

# An H1 without metadata
## An H2 level block with metadata --- {creator: 'seamus', source: 'elationfoundation/using-tor-browser-bundle'}

or

2.

# An H1 without metadata
## An H2 level block with metadata 
--- {creator: 'seamus', source: 'elationfoundation/using-tor-browser-bundle'}

or

3.

# An H1 without metadata
## An H2 level block with metadata 

---
creator: 'seamus'
source: 'elationfoundation/using-tor-browser-bundle'

---

I prefer 2. because it looks like a byline.

A content editor (#5) should probably hide inline block level metadata (the way prose hides document level metadata), and allow to surface them when needed.

jmatsushita commented 8 years ago

Some interesting Markdown flavors:

MEMOFON

- _italic_
- **bold**
- [links](http://google.com)
 - images
    ![](/images/doc/grasshopper.png)
- blockquote
  > The question is, whether you can make
  >> words mean so many different things.
- code
    var test = function test() {
      return this.isTest();
    };

produces this mind map:

And a thread on the hCal microformat in Markdown which includes some thoughts on :

(startdate-optional enddate)[description/title  <at>  location]

for example:

(23rd June 2002)[Big Meeting  <at>  Room 200, Bldg 3]
(10am-2pm)[World Cup game]
seamustuohy commented 8 years ago

Check out Substance It seems to have an interface for adding custom content types that might make it a good choice for an editing environment.

jmatsushita commented 8 years ago

There's several discussions here. I'll try to unpack the various topics while maybe keeping this issue to keep track of the bigger picture.

Starting with the bigger picture, I think the actual problem that's interesting here is transclusion. As mentioned here https://www.mediawiki.org/wiki/Transclusion

Ted Nelson coined the term "transclusion," as well as "hypertext" and "hypermedia", in his 1982 book, Literary Machines.

image

There are questions related to a few different topics that interact in various ways and need to be well aligned together, and with the objectives of maintaining readability of source documents.

Infrastructure of transclusion

How content which includes transcluded content is kept up to date? This is fairly straightforward on a single platform (with subtleties) but harder in a distributed heterogeneous environment when you'll start needing ETags and fragment caching and Fragment Identifiers. Doing this with static content generation adds another interesting challenge (including graceful degradation with simple webservers, and taking advantage of more advanced caching strategies).

Addressing or URIs for fragments

Addressing fragments consistently within Markdown is an interesting opportunity (leveraging the AST of pandoc, but also thinking about content addressability and smart things like content defined chunking in dat and using NLP in the process to keep track of content and do smart custom merges with git (Like daff for CSV or this JSON custom Git merge driver. The reason why this matters is because addressing fragments via only document structure might not be enough to deal with moving paragraphs, inserting paragraphs, changing heading titles and so on. Also looking in the direction of Operational Transforms might be a more sustainable (and functional monadic approach by only describing a suite of operations which might be a more composable approach) to diffing and merging.

image

Syntax for transclusion

This was the original topic of this issue and discussed here: https://github.com/iilab/contentascode/issues/25#issuecomment-196682914.

There are a few needs:

There are a few possible implementation approaches:

Trying to minimise the number of extensions of existing standard is obviously best. Also it's worth noting that the YAML/Markdown combo is a bit of an ad-hoc extension which is actually not necessary well supported outside of the SSG world.

In the minimal implementation it seems that maybe only extending markdown to do transclusion might be enough because the metadata could be implicit (i.e. either inferred from the transclusion statement - external source and so on... - or present in the transcluded content - either as published metadata at the fragment level or in the destination source content if the system is using a content as code approach).

Other topics

In a content package approach (with possibly npm publishing) all dependencies in the tree would be inherited up into each layers (each subsection might be treated as an implicit package) and up to the top package metadata.

The transitivity of transclusion dependencies might be an interesting topic.

Dealing with modified transcluded content is another challenge that relates to content/fragment addressing. Again it seems that an OT approach would be a good way to deal with this. In that case, where would the metadata about transformations be stored? Maybe the flow would be:

So in the end I would have:

  index.md                              # [[file://example.org/page#section-paragraph]]
  example.org.page#section-paragraph    # locally modified version with maybe added metadata.

Using the fragment URI as the file name might help with usability and avoid having to add metadata in the simple case. A companion .ot file could be created with modifications...

jmatsushita commented 8 years ago

For block level metadata MSON looks pretty cool. The capacity to render to JSON-Schema is really interesting.

I don't think the spec suggests a way to mix Markdown and MSON though. It's not clear if the triple dash approach is really solid. But apparently it's supported by pandoc also inside documents (i.e. ignored when parsed). There a very similar discussion to the original post about metadata in documents which also mentions that the triple dash syntax is a bit too verbose for short metadata inclusion](http://talk.commonmark.org/t/jekyll-style-do-not-show-or-parse-sections/918).

Also jqm's mentions that it's not clear what contains what and what is an extension of what.

Interesting to note that there's no real commenting syntax in Markdown (but some really creative ways to comment still). There's a discussion on the CommonMark discussion site.

Also there's a discussion on the CommonMark site about Transclusion syntax and a great list of Markdown extensions https://github.com/jgm/CommonMark/wiki/Proposed-Extensions

I think for now I'm settling on:

jmatsushita commented 8 years ago

There's a discussion about endmatter and sectionmatter on the gray-matter issue tracker! https://github.com/jonschlinkert/gray-matter/issues/20