jgm / djot

A light markup language
https://djot.net
MIT License
1.62k stars 43 forks source link
commonmark lua markdown markup-language pandoc

Djot

Djot is a light markup syntax. It derives most of its features from commonmark, but it fixes a few things that make commonmark's syntax complex and difficult to parse efficiently. It is also much fuller-featured than commonmark, with support for definition lists, footnotes, tables, several new kinds of inline formatting (insert, delete, highlight, superscript, subscript), math, smart punctuation, attributes that can be applied to any element, and generic containers for block-level, inline-level, and raw content.

The project began as an attempt to implement some of the ideas I suggested in my essay Beyond Markdown. (See Rationale, below.)

This repository contains a Syntax Description, a Cheatsheet, and a Quick Start for Markdown Users that outlines the main differences between djot and Markdown.

You can try djot on the djot playground without installing anything locally.

Rationale

Here are some design goals:

  1. It should be possible to parse djot markup in linear time, with no backtracking.

  2. Parsing of inline elements should be "local" and not depend on what references are defined later. This is not the case in commonmark: [foo][bar] might be "[foo]" followed by a link with text "bar", or "[foo][bar]", or a link with text "foo", or a link with text "foo" followed by "[bar]", depending on whether the references [foo] and [bar] are defined elsewhere (perhaps later) in the document. This non-locality makes accurate syntax highlighting nearly impossible.

  3. Rules for emphasis should be simpler. The fact that doubled characters are used for strong emphasis in commonmark leads to many potential ambiguities, which are resolved by a daunting list of 17 rules. It is hard to form a good mental model of these rules. Most of the time they interpret things the way a human would most naturally interpret them---but not always.

  4. Expressive blind spots should be avoided. In commonmark, you're out of luck if you want to produce the HTML a<em>?</em>b, because the flanking rules classify the first asterisk in a*?*b as right-flanking. There is a way around this, but it's ugly (using a numerical entity instead of a). In djot there should not be expressive blind spots of this kind.

  5. Rules for what content belongs to a list item should be simple. In commonmark, content under a list item must be indented as far as the first non-space content after the list marker (or five spaces after the marker, in case the list item begins with indented code). Many people get confused when their indented content is not indented far enough and does not get included in the list item.

  6. Parsers should not be forced to recognize unicode character classes, HTML tags, or entities, or perform unicode case folding. That adds a lot of complexity.

  7. The syntax should be friendly to hard-wrapping: hard-wrapping a paragraph should not lead to different interpretations, e.g. when a number followed by a period ends up at the beginning of a line. (I anticipate that many will ask, why hard-wrap at all? Answer: so that your document is readable just as it is, without conversion to HTML and without special editor modes that soft-wrap long lines. Remember that source readability was one of the prime goals of Markdown and Commonmark.)

  8. The syntax should compose uniformly, in the following sense: if a sequence of lines has a certain meaning outside a list item or block quote, it should have the same meaning inside it. This principle is articulated in the commonmark spec, but the spec doesn't completely abide by it (see commonmark/commonmark-spec#634).

  9. It should be possible to attach arbitrary attributes to any element.

  10. There should be generic containers for text, inline content, and block-level content, to which arbitrary attributes can be applied. This allows for extensibility using AST transformations.

  11. The syntax should be kept as simple as possible, consistent with these goals. Thus, for example, we don't need two different styles of headings or code blocks.

These goals motivated the following decisions:

Syntax

For a full syntax reference, see the syntax description.

A vim syntax highlighting definition for djot is provided in editors/vim/.

Implementations

There are currently six djot implementations:

djot.lua was the original reference implementation, but current development is focused on djot.js, and it is possible that djot.lua will not be kept up to date with the latest syntax changes.

File extension

The extension .dj may be used to indicate that the contents of a file are djot-formatted text.

License

The code and documentation are released under the MIT license.