tree-sitter-grammars / tree-sitter-markdown

Markdown grammar for tree-sitter
MIT License
411 stars 52 forks source link

Allow YAML metadata blocks anywhere in the document #76

Open ghost opened 1 year ago

ghost commented 1 year ago

Right now, a YAML metadata block is parsed as a metadata block only if it occurs at the beginning of a markdown document. Pandoc allows it to occur anywhere:

A YAML metadata block is a valid YAML object, delimited by a line of three hyphens (---) at the top and a line of three hyphens (---) or three dots (...) at the bottom. The initial line --- must not be followed by a blank line. A YAML metadata block may occur anywhere in the document, but if it is not at the beginning, it must be preceded by a blank line.

MDeiml commented 1 year ago

Hm, you're right. There's the problem though that it's only parsed as a metadata block if it is valid YAML. That's of course a bit hard to verify and probably out of the scope of this parser. Just assuming that the contents are valid would work of course, but that would mean that there is a lot of false positives. E.g.

Paragraph followed by thematic break

---

More text that is not valid YAML

---

Even more text

would be parsed as a metadata block. This is not so much a problem if the metadata is at the beginning of the file, as documents usually don't start with a thematic break.

I guess the best way to have some heuristics in case the block is not at the start of the file. For example only parse something as a metadata block if the first line contains a :.

Any suggestions?

MDeiml commented 1 year ago

Also do you have an example where you would be using this? That would maybe help getting the heuristics right.

ghost commented 1 year ago

Hm, for some reason I though thematic breaks always need a blank line before and after -- that would (I think) distinguish them from YAML metadata delimitors.

As for usage: since you can specify bibliography in YAML metadata blocks, I usually specify them at the bottom of the document. Similarly it makes sense to specify them at the end of each section/chapter, if there are more chapters in the same document.

MDeiml commented 1 year ago

Hm, for some reason I though thematic breaks always need a blank line before and after -- that would (I think) distinguish them from YAML metadata delimitors.

They do not https://spec.commonmark.org/0.30/#thematic-breaks, but I think most people use blank lines anyways, so what you're saying is still true.

As for usage: since you can specify bibliography in YAML metadata blocks, I usually specify them at the bottom of the document. Similarly it makes sense to specify them at the end of each section/chapter, if there are more chapters in the same document.

Sounds reasonable. Sadly it's hard to detect the "end" of something in treesitter, so I don't think I can use that. Thanks though :)