nim-lang / RFCs

A repository for your Nim proposals.
135 stars 26 forks source link

Nim's extension to Markdown #536

Open a-mr opened 9 months ago

a-mr commented 9 months ago

Abstract

This proposes a new syntax for markup language, while preserving orderly and well-structured nature of RST, e.g.:

For adding things to index use
    "sink"{idx}
instead of
    `sink`:idx:

For admonitions use
    """Warning
    text...
    """
instead of:
    .. Warning::
      text
For document metadata use
    {author="Andreas Rumpf",
     version=|nimversion|}
instead of
    :Author: Andreas Rumpf
    :Version: |nimversion|

Basically, relation of triple """ to single " is the same as triple ``` is to single `. The syntax is partially inspired by Nim pragmas and Pandoc Markdown extensions. The syntax does not imply adding new features, except the section "Future direction" below.

Motivation

The presence of RST syntax seems to be alien to both Nim and Markdown. But we'd like to keep RST features, while we don't want to adopt Pandoc Markdown dialect and other Markdown dialects. The proposed style is more simple to memorize and more elegant than Pandoc Markdown.

Description

The 2 ideas is to use quote marks " for all non-code text fragments whose meaning can be modified, while for code we keep using `; and to use {pragma} syntax everywhere for modifying: for global document's metadata and both blocks and inline spans. Unlike Nim pragmas and Pandoc we will not use dot . after {

Inline things are modified by prepending or appending of pragma to `...` or "...".

  1. Changing a language in Inline code to be done so:
    Check `throw`{cpp} or {cpp}`throw`.
  2. Similarly, for changing spans we use pragma for "...":
    See "some definition"{target} to make an inline target from the text

Directives are now split into three groups:

  1. Code. It just continues to use the existing Markdown syntax:
      ```nim
      proc f
  2. Directives proper, i.e. function-like things introducing new blocks """directive optionalPar=value. Directives are divided to a. ones accepting an obligatory "text block" argument and b. those that don't have their own text: a. E.g. admonitions:
      """Warning:
        it's a warning
      """

    b. E.g. table of contents:

      """contents
      """
  3. Pragmas modifying global state {variable=value} at the top-level, e.g.:
      {author="Araq", year="2020"}
      At the beginning of text.

And we continue to use the already added Pandoc-inspired extensions, most importantly concise referencing syntaxt [target].

Code Examples

More examples & comparison with RST

This proposal is based on the observation that RST directives are used for 3 distinct things: 1. literal code blocks, 2. visual blocks and inline spans that present a normal text which is just packed a little bit differently 3. modifying global state that is not displayed anyhow. The difference between 1 and 2 is significant: text in code is handled completely differently as it's not formatted by any markup language rules, but instead a completely isolated programming language highlighters. Among some other confusing things in RST syntax is that .. is used for comments, but after a thing like directiveName:: is added to, .. directiveName:: suddenly becomes some important block.

Blocks:

"""include system.md
"""

instead of RST:

.. include:: system.md
"""Figure file="path/to/fig.png" height=100px width=300px
  Caption of the figure
"""

instead of RST:

.. figure:: path/to/fig.png
  :height: 100px
  :width: 200px

  Caption of the figure
"""Warning:
  it's a warning
"""

instead of:

.. Warning:: it's
  a warning
"""contents
"""

instead of:

.. contents::

A possible syntax for adding anchors to paragraphs:

"""target="sink description"
Paragrap text...
"""

Inline spans & code:

Inline targets: "some definition"{target}, where empty argument to `target` means to re-use the text in parentheses,
(instead of RST syntax _`some definition`),
it can be put on inline code fragments like `proc f`{nim,target="code"}.
Then ref. [some definition] and [code].

The .. syntax at beginning on the lines will continue to be used for comments only (as Markdown implementation don't have any sane comment syntax).

Comparison with Pandoc

We don't want to accept Pandoc Markdown because it seems complex and not systematic:

  1. Generally a bit too much new syntax that seems completely arbitrary.
  2. Unclear why one would need Yaml support just to input metadata. In this proposal it's solved by pragmas like {author="Araq", title="Nim Manual"}.
  3. The syntax for inline spans and blocks is not aligned, for inline spans they use [...]{.class} syntax, for blocks ::: Warning ::: syntax, we will use "..."{class} and """Warning syntax, which is also more aligned with Markdown code block style.

BTW the dot is present in Pandoc in `code`{.nim} but has different meaning ("class" name).

Future directions

Instead, for future directions, we can continue utilize the general principle of RST of re-using existing syntax whenever possible. E.g. if we want to add a glossary feature, which is basically just a Definition list + automatic linking/indexing, then we could use something like this:

"""Glossary
DMA
: Direct Memory Access
MMIO
: Memory-Mapped I/O
"""

...
Using [DMA] means...

For Flexible page structure we can use pragmas like this to put the symbol into another section::

proc f =
  ## {section=5}
  ## Function description...

Backwards Compatibility

Initially old RST syntax is supported with deprecation warnings in Nim 2.2, then it's expected to break backward compatibility by allowing only new syntax in docgen Nim 2.4. However, one can still use {.doctype: Rst.} (or RstMarkdown) Nim pragma to turn on the previous markup language.

Araq commented 9 months ago

But we moved from RST to markdown because "Markdown has won". Now, given that Markdown does not support all the features we need and since RST is close to markdown IMO we use some of RST's syntax still. Where is the benefit in coming up with our own syntax that is neither RST, markdown nor pandoc?

a-mr commented 9 months ago

Where is the benefit in coming up with our own syntax that is neither RST, markdown nor pandoc?

More coherent syntax? Of course, if everyone is satisfied with our current language direction (the mess of features including some Markdown dialects and RST, kinda like C++ in the world of markup languages), then so be it ;-)

Varriount commented 9 months ago

One benefit the .. directive: syntax has is that it is more obviously a syntactical construct, rather than a form of normal punctuation.

On a similar note, it's much more likely that """ will be mistaken by a reader as normal punctuation, simply because of the fact that quotes are actually used in normal writing. Backticks aren't actually used in normal writing all that much, so the backtick syntax has the benefit that it looks "special".

Overall, I'm +0.5 for the proposed syntax. I like the consistency of it, but dislike the fact that it doesn't really stand out all that much (especially the global pragma syntax). Could some other set of symbols be used instead, say &, .., or $?

(If only we could go back in time and get more punctuation added to everyday keyboards)

Araq commented 9 months ago

the mess of features including some Markdown dialects and RST, kinda like C++ in the world of markup languages

"Markdown + custom features" is not significantly less messy and the Markdown world itself is messy as it's a set of language dialects.

a-mr commented 9 months ago

On a similar note, it's much more likely that """ will be mistaken by a reader as normal punctuation, simply because of the fact that quotes are actually used in normal writing. Backticks aren't actually used in normal writing all that much, so the backtick syntax has the benefit that it looks "special".

Good point. Maybe it's worth using | for the new syntax instead of ":

See |some definition|{target}

||| Warning:
  it's a warning
|||

Could some other set of symbols be used instead, say &, .., or $?

Well, using braces is an important point of this proposal — for similarity with Nim and Pandoc. Maybe just add some exclamation before it like:

!!! {author="Araq", year="2020"}

?

Araq commented 9 months ago

If you want to add custom syntax at least do it properly:

  1. Allow for inline HTML like <div>abcdef</div>.
  2. Only allow a subset of HTML and tags which can be mapped to PDF etc and is known not to have security holes.
arnetheduck commented 9 months ago

learning yet another unique set of extensions for nim alone seems to have enough little value that I'd hesitate to write docs at all because of the additional effort of learning yet another syntax for magic stuff with little return - without content, the extras are pointless and Nim docs are far far away from the point where extensions form the bottleneck for greater quality.

+1 for html, it is after all already part of markdown.

fwiw, here's an example of what we use and using something existing seems like a win-win for everyone: https://squidfunk.github.io/mkdocs-material/reference/.

this has enough extensions that one could choke on them - it's rare that I ever think "oh boy, this documentation that has one line of content would be so much better if only the author was properly rendered in italics" - more reuse, less maintenance, smaller learning curve, more time spent on content.

Varriount commented 9 months ago

On that topic, what do other dialects for Markdown use for these purposes?

a-mr commented 9 months ago

On that topic, what do other dialects for Markdown use for these purposes?

Varriount commented 9 months ago

I don't particularly like using the same syntax for code and normal text, but I must admit they are really consistent.

I mean, if you think about it, "code" blocks do implement a kind of pragma syntax, though it isn't one that's extendable to modifying an entire document.

a-mr commented 8 months ago

I mean, if you think about it, "code" blocks do implement a kind of pragma syntax, though it isn't one that's extendable to modifying an entire document.

Sure. But for me the biggest intuitive argument against sharing the same syntax for text blocks and code blocks is that of existing syntax highlighting in editors like vim in *.md files: the text is always highlighted while the code is not, which emphasizes their different nature.

To summarize this RFC's discussion: Plan A is to continue the current approach: use RST syntax as it is, and possibly adapt some basic CommonMark things into Markdown mode while not adding any syntax extensions anymore Plan B is to adopt Pandoc Markdown syntax extensively if and only if it finally becomes de facto standard of Markdown in the future