micromark / common-markup-state-machine

CMSM: Common markup state machine
49 stars 0 forks source link

CMSM

🪦 Archived: this document is not maintained. This document was made jointly with micromark, which was later also turned into markdown-rs. At present, I don’t have the bandwidth to maintain 2 reference parsers and a spec.


Common markup state machine.

Together, the parsing rules described below define what is referred to as a Common Markup parser.

This document is currently in progress. It is developed jointly with a reference parser: micromark. Contributions are welcome.

Some parts that are still in progress:

  • Adapters
  • Define the regular constructs
  • Adapter for rich text to check whether emphasis, strong, resource, or reference sequences make up syntax or text
  • Tokenizing the input stream in reverse (GFM allows asd@asd.com, so it seems we need to somehow allow to match the @ and parse backwards)
  • Add an appendix of extensions

Table of contents

1 Background

The common markup parser parses a markup language that is commonly known as Markdown.

The first definition of this format gave several examples of how it worked, showing input Markdown and output HTML, and came with a reference implementation (known as Markdown.pl). When new implementations followed, they mostly followed the first definition, but deviated from the first implementation, thus making the format a family of formats.

Some years later, an attempt was made to standardize the differences between implementations, by specifying how several edge cases should be handled, through more input and output examples. This attempt is known as CommonMark, and many implementations now follow it.

This document defines a more formal format, based on CommonMark, by documenting how to parse it, instead of documenting input and output examples. This format is:

The origin story of Markdown is similar to that of HTML, which at a time was also a family of formats. Through incredible efforts of the WHATWG, a Living Standard was created on how to parse the format, by defining a state machine.

2 Overview

The common markup parser receives input, typically coming over the network or from the local file system. This input is represented as characters in the input stream. Depending on a character, certain effects occur, such as that a new token is created, one state is switched to another, or something is labelled. Each line is made up of tokens, such as whitespace, markers, sequences, and content, and labels, that are both enqueued. At a certain point, it is known what to do with the queue, whether to discard it or to use it, in which case it is adapted.

The parser parses in three stages: flow, content, and text, respectively coming with their own state machines (flow state machine, content state machine, text state machine), and their own adapters.

3 Infra

This section defines the fundamental concepts upon which this document is built.

A variable is declared in the shared state with let, cleared with unset, or changed with set, increment, decrement, append or prepend.

4 Characters

A character is a Unicode code point and is represented as a four to six digit hexadecimal number, prefixed with U+ ([UNICODE]).

4.1 Character groups

An ASCII digit is a character in the inclusive range U+0030 (0) to U+0039 (9).

An ASCII upper hex digit a character in the inclusive range U+0041 (A) to U+0046 (F).

An ASCII lower hex digit a character in the inclusive range U+0061 (a) to U+0066 (f).

An ASCII hex digit is an ASCII digit, ASCII upper hex digit, or an ASCII lower hex digit

An ASCII upper alpha is a character in the inclusive range U+0041 (A) to U+005A (Z).

An ASCII lower alpha is a character in the inclusive range U+0061 (a) to U+007A (z).

An ASCII alpha is an ASCII upper alpha or ASCII lower alpha.

An ASCII alphanumeric is an ASCII digit or ASCII alpha.

An ASCII punctuation is a character in the inclusive ranges U+0021 EXCLAMATION MARK (!) to U+002F SLASH (/), U+003A COLON (:) to U+0040 AT SIGN (@), U+005B LEFT SQUARE BRACKET ([) to U+0060 GRAVE ACCENT (`), or U+007B LEFT CURLY BRACE ({) to U+007E TILDE (~).

An ASCII control is a character in the inclusive range U+0000 NULL (NUL) to U+001F (US), or U+007F (DEL).

A Unicode whitespace is a character in the Unicode Zs (Separator, Space) category, or U+0009 CHARACTER TABULATION (HT), U+000A LINE FEED (LF), U+000C (FF), or U+000D CARRIAGE RETURN (CR) ([UNICODE]).

A Unicode punctuation is a character in the Unicode Pc (Punctuation, Connector), Pd (Punctuation, Dash), Pe (Punctuation, Close), Pf (Punctuation, Final quote), Pi (Punctuation, Initial quote), Po (Punctuation, Other), or Ps (Punctuation, Open) categories, or an ASCII punctuation ([UNICODE]).

An atext is an ASCII alphanumeric, or a character in the inclusive ranges U+0023 NUMBER SIGN (#) to U+0027 APOSTROPHE ('), U+002A ASTERISK (*), U+002B PLUS SIGN (+), U+002D DASH (-), U+002F SLASH (/), U+003D EQUALS TO (=), U+003F QUESTION MARK (?), U+005E CARET (^) to U+0060 GRAVE ACCENT (`), or U+007B LEFT CURLY BRACE ({) to U+007E TILDE (~) ([RFC5322]).

To ASCII-lowercase a character, is to increase it by 0x20, if it an ASCII upper alpha.

To digitize a character, is to decrease it by 0x30, 0x37, or 0x57, if it is an ASCII digit, ASCII upper hex digit, or ASCII lower hex digit, respectively.

4.2 Conceptual characters

A VIRTUAL SPACE character is a conceptual character representing an expanded column size of a U+0009 CHARACTER TABULATION (HT).

An EOL character is a conceptual character representing a break between two lines.

An EOF character is a conceptual character representing the end of the input.

VIRTUAL SPACE, EOL, and EOF are not real characters, but rather represent a character increase the size of a character, a break between characters, or the lack of any further characters.

4.3 Tabs

Tabs (U+0009 CHARACTER TABULATION (HT)) are typically not expanded into spaces, but do behave as if they were replaced by spaces with a tab stop of 4 characters. These character increments are represented by VIRTUAL SPACE characters.

For the following markup (where represent a tab):

>␉␉a

We have the characters: U+003E GREATER THAN (>), U+0009 CHARACTER TABULATION (HT), VIRTUAL SPACE, VIRTUAL SPACE, U+0009 CHARACTER TABULATION (HT), VIRTUAL SPACE, VIRTUAL SPACE, VIRTUAL SPACE, and U+0061 (a).

When transforming to an output format, tab characters that are not part of syntax should be present in the output format. When the tab itself (and zero or more VIRTUAL SPACE characters) are part of syntax, but some VIRTUAL SPACE characters are not, the remaining VIRTUAL SPACE characters should be considered a prefix of the content.

5 Input stream

The input stream consists of the characters pushed into it.

The input character is the first character in the input stream that has not yet been consumed. Initially, the input character is the first character in the input. When the last character in a line is consumed, the input character is an EOL. Finally, when all character are consumed, the input character is an EOF.

Any occurrences of U+0009 CHARACTER TABULATION (HT) in the input stream is represented by that character and 0-3 VIRTUAL SPACE characters.

5.1 Preprocessing the input stream

The input stream consists of the characters pushed into it as the input is decoded.

The input, when decoded, is preprocessed and pushed into the input stream as described in the following algorithm:

6 Parsing

The states of state machines have certain effects, such as that they create items in the queue (tokens and labels). The queue is used by tree adapters, in case a valid construct is found. After using the queue, or when in a bogus construct is found, the queue is discarded.

The shared space is accessed and mutated by both the tree adapter and the states of the state machine.

Constructs are registered by hooking a case (one or more characters or character groups) into certain states. Upon registration, they define the states used to parse a construct, and the adapter used to handle the construct.

6.1 Tokenization

Implementations must act as if they use several state machines to tokenize common markup. The flow state machine is used to tokenize the line constructs that make up the structure of the document. The content state machine is used to tokenize the inline constructs part of content blocks. The text state machine is used to tokenize the inline constructs part of rich or plain text.

Most states consume the input character, and either remain in the state to consume the next character, reconsume the input character in a different state, or switch to a different state to consume the next character. States enqueue tokens and labels.

6.2 State

The shared space is a map of key/value pairs.

The queue is a list of tokens and labels that are enqueued. The current token is the last token in the queue.

6.3 Constructs

Markup is parsed per construct. Some constructs are considered regular (those from CommonMark, such as ATX headings) and other constructs are extensions (such as YAML frontmatter or MDX).

❗️ Define constructs.

6.4 Effects

6.4.1 Switch

To switch to a state is to wait for the next character in the given state.

6.4.2 Consume

To consume the input character affects the current token. Due to the nature of the state machine, it is not possible to consume if there is no current token.

6.4.3 Reconsume

To reconsume is to switch to the given state, and consume the input character there.

6.4.4 Enqueue

To enqueue a label is to mark a point between two tokens with a semantic name, at which point there is no current token.

To enqueue a token is to add a new token of the given type to the queue, making it the new current token.

6.4.5 Ensure

To ensure a token is to enqueue that token if the current token is not of the given type, and otherwise do nothing.

7 Flow state machine

The flow state machine is used to tokenize the line constructs that make up the structure of the document (such as headings or thematic breaks) and must start in the Flow prefix start state.

7.1 Flow prefix start state

7.2 Flow start state

Hookable, there are no regular hooks

7.3 Flow prefix initial state

7.4 Flow initial state

❗️ Todo: Indented code v.s. content

Hookable, the regular hooks are:

❗️ Todo, continuation:

7.5 Blank line state

7.6 ATX heading start state

7.7 ATX heading fence open inside state

7.8 ATX heading inside state

7.9 Thematic break asterisk start state

7.10 Thematic break asterisk inside state

7.11 Setext heading underline dash start state

❗️ Todo: exit if not preceded by content

7.12 Setext heading underline dash inside state

7.13 Setext heading underline dash after state

❗️ Todo: Close content if ok, create a new content if nok

7.14 Thematic break dash start state

7.15 Thematic break dash inside state

7.16 Flow HTML start state

7.17 Flow HTML tag open state

7.18 Flow HTML markup declaration open state

7.19 Flow HTML end tag open state

7.20 Flow HTML tag name state

7.21 Flow HTML basic self closing state

7.22 Flow HTML complete attribute name before state

7.23 Flow HTML complete attribute name state

7.24 Flow HTML complete attribute name after state

7.25 Flow HTML complete attribute value before state

7.26 Flow HTML complete attribute value double quoted state

7.27 Flow HTML complete attribute value single quoted state

7.28 Flow HTML complete attribute value unquoted state

7.29 Flow HTML complete self closing state

7.30 Flow HTML complete tag after state

7.31 Flow HTML continuation state

7.32 Flow HTML continuation comment inside state

7.33 Flow HTML continuation raw tag open state

7.34 Flow HTML continuation raw end tag state

Note: This state can be optimized by either imposing a maximum size (the size of the longest possible raw tag name) or by using a trie of the possible raw tag names.

7.35 Flow HTML continuation character data inside state

7.36 Flow HTML continuation declaration before state

7.37 Flow HTML continuation close state

7.38 Setext heading underline equals to start state

❗️ Todo: exit if not preceded by content

7.39 Setext heading underline equals to inside state

7.40 Setext heading underline equals to after state

❗️ Todo: Close content if ok, create a new content if nok

7.41 Thematic break underscore start state

7.42 Thematic break underscore inside state

7.43 Fenced code grave accent start state

7.44 Fenced code grave accent open fence inside state

7.45 Fenced code grave accent open fence after state

7.46 Fenced code grave accent continuation state

7.47 Fenced code grave accent close fence inside state

7.48 Fenced code grave accent close fence after state

7.49 Fenced code grave accent continuation inside state

7.50 Fenced code tilde start state

7.51 Fenced code tilde open fence inside state

7.52 Fenced code tilde open fence after state

7.53 Fenced code tilde continuation state

7.54 Fenced code tilde close fence inside state

7.55 Fenced code tilde close fence after state

7.56 Fenced code tilde continuation inside state

7.57 Flow content state

8 Content state machine

The content state machine is used to tokenize the inline constructs part of content blocks in a document (such as regular definitions and phrasing) and must start in the Content start state.

8.1 Content start state

Hookable, the regular hooks are:

8.2 Content initial state

Hookable, there are no regular hooks.

8.3 Definition label start state

8.4 Definition label before state

8.5 Definition label inside state

8.6 Definition label between state

8.7 Definition label escape state

8.8 Definition label after state

8.9 Definition destination before state

8.10 Definition destination quoted inside state

8.11 Definition destination quoted escape state

8.12 Definition destination unquoted inside state

8.13 Definition destination unquoted escape state

8.14 Definition destination after state

8.15 Definition title before state

8.16 Definition title or label before state

8.17 Definition title double quoted state

8.18 Definition title double quoted between state

8.19 Definition title double quoted escape state

8.20 Definition title single quoted state

8.21 Definition title single quoted between state

8.22 Definition title single quoted escape state

8.23 Definition title paren quoted state

8.24 Definition title paren quoted between state

8.25 Definition title paren quoted escape state

8.26 Definition title after state

8.27 Phrasing content state

9 Text state machine

The text state machine is used to tokenize the inline constructs part of rich text (such as regular resources and emphasis) or plain text (such as regular character escapes or character references) in a document and must start in the Text start state.

If text is parsed as plain text, the Text start state, Text initial state, and Text state all forward to the Plain text state.

If text is parsed as rich text, an additional variable prev must be tracked. Initial set to EOF, it must be set to the input character right before a character is consumed.

9.1 Text start state

Hookable, there are no regular hooks.

9.2 Text initial state

Hookable, there are no regular hooks.

9.3 Text state

Hookable, the regular hooks are:

9.4 Plain text state

Hookable, the regular hooks are:

9.5 End-of-line state

9.6 Plain end-of-line state

9.7 Image label start state

9.8 Image label start after state

9.9 Character reference state

9.10 Character reference start after state

9.11 Character reference named state

Note: This state can be optimized by either imposing a maximum size (the size of the longest possible named character reference) or by using a trie of the possible named character references.

9.12 Character reference numeric state

9.13 Character reference hexadecimal start state

9.14 Character reference hexadecimal state

Note: This state can be optimized by imposing a maximum size (the size of the longest possible valid hexadecimal character reference, 6).

9.15 Character reference decimal state

Note: This state can be optimized by imposing a maximum size (the size of the longest possible valid decimal character reference, 7).

9.16 Delimiter run asterisk start state

9.17 Delimiter run asterisk state

9.18 Autolink state

9.19 Autolink open state

9.20 Autolink email atext state

9.21 Autolink email label state

9.22 Autolink email at sign or dot state

9.23 Autolink email dash state

9.24 Autolink scheme or email atext state

9.25 Autolink scheme inside or email atext state

9.26 Autolink URI inside state

9.27 HTML state

9.28 HTML open state

9.29 HTML declaration start state

9.30 HTML comment open inside state

9.31 HTML comment inside state

9.32 HTML comment close inside state

9.33 HTML comment close state

Note: a CM comment may not contain two dashes (--), and may not end in a dash (which would result in --->). Here we have seen two dashes, so we can either be at the end of a comment, or no longer in a comment.

9.34 HTML CDATA inside state

9.35 HTML declaration inside state

9.36 HTML instruction inside state

9.37 HTML instruction close state

9.38 HTML tag close start state

9.39 HTML tag close inside state

9.40 HTML tag close between state

Note: an EOL is technically allowed here, but as a > after an EOL would start a blockquote, practically it’s not possible.

9.41 HTML tag open inside state

9.42 HTML tag open between state

9.43 HTML tag open self closing state

9.44 HTML tag open attribute name state

9.45 HTML tag open attribute name after state

9.46 HTML tag open attribute before state

9.47 HTML tag open double quoted attribute state

9.48 HTML tag open single quoted attribute state

9.49 HTML tag open unquoted attribute state

9.50 Link label start state

9.51 Character escape state

9.52 Character escape after state

9.53 Break escape state

9.54 Break escape after state

9.55 Label resource close state

9.56 Label resource end after state

9.57 Resource information open state

9.58 Resource information destination quoted inside state

9.59 Resource information destination quoted escape state

9.60 Resource information destination quoted after state

9.61 Resource information destination unquoted inside state

9.62 Resource information destination unquoted escape state

9.63 Resource information between state

9.64 Resource information title double quoted inside state

9.65 Resource information title double quoted escape state

9.66 Resource information title single quoted inside state

9.67 Resource information title single quoted escape state

9.68 Resource information title paren quoted inside state

9.69 Resource information title paren quoted escape state

9.70 Resource information title after state

9.71 Label reference close state

9.72 Label reference end after state

9.73 Reference label open state

9.74 Reference label inside state

9.75 Reference label between state

9.76 Reference label escape state

9.77 Label reference shortcut close state

9.78 Delimiter run underscore start state

9.79 Delimiter run underscore state

9.80 Code start state

9.81 Code open fence inside state

9.82 Code between state

9.83 Code inside state

9.84 Code close fence inside state

10 Labels

10.1 NOK label

10.2 Blank line end label

10.3 ATX heading start label

10.4 ATX heading fence start label

10.5 ATX heading fence end label

10.6 ATX heading end label

10.7 Thematic break start label

10.8 Thematic break end label

10.9 Setext heading underline start label

10.10 Setext heading underline end label

10.11 Fenced code start label

10.12 Fenced code fence start label

10.13 Fenced code fence sequence start label

10.14 Fenced code fence sequence end label

10.15 Fenced code fence end label

10.16 Fenced code end label

10.17 Content definition partial label

10.18 Content definition start label

10.19 Content definition label start label

10.20 Content definition label open label

10.21 Content definition label close label

10.22 Content definition label end label

10.23 Content definition destination start label

10.24 Content definition destination quoted open label

10.25 Content definition destination quoted close label

10.26 Content definition destination unquoted open label

10.27 Content definition destination unquoted close label

10.28 Content definition destination end label

10.29 Content definition title start label

10.30 Content definition title open label

10.31 Content definition title close label

10.32 Content definition title end label

10.33 Content definition end label

10.34 Hard break label

10.35 Soft break label

10.36 Image label start label

10.37 Image label open label

10.38 Character reference start label

10.39 Character reference end label

10.40 Delimiter run start label

10.41 Delimiter run end label

10.42 Autolink start label

10.43 Autolink open label

10.44 Autolink email close label

10.45 Autolink email end label

10.46 Autolink uri close label

10.47 Autolink uri end label

10.48 HTML start label

10.49 HTML end label

10.50 Link label start label

10.51 Link label open label

10.52 Character escape start label

10.53 Character escape end label

10.54 Break escape start label

10.55 Break escape end label

10.56 Label close label

10.57 Label end label

10.58 Resource information start label

10.59 Resource information open label

10.60 Resource information destination quoted start label

10.61 Resource information destination quoted open label

10.62 Resource information destination quoted close label

10.63 Resource information destination quoted end label

10.64 Resource information destination unquoted start label

10.65 Resource information destination unquoted open label

10.66 Resource information destination unquoted close label

10.67 Resource information destination unquoted end label

10.68 Resource information title start label

10.69 Resource information title open label

10.70 Resource information title close label

10.71 Resource information title end label

10.72 Resource information close label

10.73 Resource information end label

10.74 Reference label start label

10.75 Reference label open label

10.76 Reference label collapsed close label

10.77 Reference label full close label

10.78 Reference label end label

10.79 Code start label

10.80 Code fence start label

10.81 Code fence end label

10.82 Code end label

11 Tokens

11.1 Whitespace token

A Whitespace token represents inline whitespace that is part of syntax instead of content.

interface Whitespace <: Token {
  size: number
  used: number
  characters: [Character]
}
{
  type: 'whitespace',
  characters: [9],
  size: 3,
  used: 0
}

11.2 Line terminator token

A Line terminator token represents a line break.

interface LineEnding <: Token {}
{type: 'lineEnding'}

11.3 End-of-file token

An End-of-file token represents the end of the syntax.

interface EndOfFile <: Token {}
{type: 'endOfFile'}

11.4 End-of-line token

An End-of-line token represents a point between two runs of text in content.

interface EndOfLine <: Token {}
{type: 'endOfLine'}

11.5 Marker token

A Marker token represents one punctuation character that is part of syntax instead of content.

interface Marker <: Token {}
{type: 'marker'}

11.6 Sequence token

A Sequence token represents one or more of the same punctuation characters that are part of syntax instead of content.

interface Sequence <: Token {
  size: number
}
{type: 'sequence', size: 3}

11.7 Content token

A Content token represents content.

interface Content <: Token {
  prefix: string
}
{type: 'content', prefix: '  '}

12 Appendix

12.1 Raw tags

A raw tag is one of: script, pre, and style.

12.2 Basic tags

A basic tag is one of: address, article, aside, base, basefont, blockquote, body, caption, center, col, colgroup, dd, details, dialog, dir, div, dl, dt, fieldset, figcaption, figure, footer, form, frame, frameset, h1, h2, h3, h4, h5, h6, head, header, hr, html, iframe, legend, li, link, main, menu, menuitem, nav, noframes, ol, optgroup, option, p, param, section, source, summary, table, tbody, td, tfoot, th, thead, title, tr, track, and ul.

12.3 Named character references

A character reference name is one of: AEli, AElig, AM, AMP, Aacut, Aacute, Abreve, Acir, Acirc, Acy, Afr, Agrav, Agrave, Alpha, Amacr, And, Aogon, Aopf, ApplyFunction, Arin, Aring, Ascr, Assign, Atild, Atilde, Aum, Auml, Backslash, Barv, Barwed, Bcy, Because, Bernoullis, Beta, Bfr, Bopf, Breve, Bscr, Bumpeq, CHcy, COP, COPY, Cacute, Cap, CapitalDifferentialD, Cayleys, Ccaron, Ccedi, Ccedil, Ccirc, Cconint, Cdot, Cedilla, CenterDot, Cfr, Chi, CircleDot, CircleMinus, CirclePlus, CircleTimes, ClockwiseContourIntegral, CloseCurlyDoubleQuote, CloseCurlyQuote, Colon, Colone, Congruent, Conint, ContourIntegral, Copf, Coproduct, CounterClockwiseContourIntegral, Cross, Cscr, Cup, CupCap, DD, DDotrahd, DJcy, DScy, DZcy, Dagger, Darr, Dashv, Dcaron, Dcy, Del, Delta, Dfr, DiacriticalAcute, DiacriticalDot, DiacriticalDoubleAcute, DiacriticalGrave, DiacriticalTilde, Diamond, DifferentialD, Dopf, Dot, DotDot, DotEqual, DoubleContourIntegral, DoubleDot, DoubleDownArrow, DoubleLeftArrow, DoubleLeftRightArrow, DoubleLeftTee, DoubleLongLeftArrow, DoubleLongLeftRightArrow, DoubleLongRightArrow, DoubleRightArrow, DoubleRightTee, DoubleUpArrow, DoubleUpDownArrow, DoubleVerticalBar, DownArrow, DownArrowBar, DownArrowUpArrow, DownBreve, DownLeftRightVector, DownLeftTeeVector, DownLeftVector, DownLeftVectorBar, DownRightTeeVector, DownRightVector, DownRightVectorBar, DownTee, DownTeeArrow, Downarrow, Dscr, Dstrok, ENG, ET, ETH, Eacut, Eacute, Ecaron, Ecir, Ecirc, Ecy, Edot, Efr, Egrav, Egrave, Element, Emacr, EmptySmallSquare, EmptyVerySmallSquare, Eogon, Eopf, Epsilon, Equal, EqualTilde, Equilibrium, Escr, Esim, Eta, Eum, Euml, Exists, ExponentialE, Fcy, Ffr, FilledSmallSquare, FilledVerySmallSquare, Fopf, ForAll, Fouriertrf, Fscr, G, GJcy, GT, Gamma, Gammad, Gbreve, Gcedil, Gcirc, Gcy, Gdot, Gfr, Gg, Gopf, GreaterEqual, GreaterEqualLess, GreaterFullEqual, GreaterGreater, GreaterLess, GreaterSlantEqual, GreaterTilde, Gscr, Gt, HARDcy, Hacek, Hat, Hcirc, Hfr, HilbertSpace, Hopf, HorizontalLine, Hscr, Hstrok, HumpDownHump, HumpEqual, IEcy, IJlig, IOcy, Iacut, Iacute, Icir, Icirc, Icy, Idot, Ifr, Igrav, Igrave, Im, Imacr, ImaginaryI, Implies, Int, Integral, Intersection, InvisibleComma, InvisibleTimes, Iogon, Iopf, Iota, Iscr, Itilde, Iukcy, Ium, Iuml, Jcirc, Jcy, Jfr, Jopf, Jscr, Jsercy, Jukcy, KHcy, KJcy, Kappa, Kcedil, Kcy, Kfr, Kopf, Kscr, L, LJcy, LT, Lacute, Lambda, Lang, Laplacetrf, Larr, Lcaron, Lcedil, Lcy, LeftAngleBracket, LeftArrow, LeftArrowBar, LeftArrowRightArrow, LeftCeiling, LeftDoubleBracket, LeftDownTeeVector, LeftDownVector, LeftDownVectorBar, LeftFloor, LeftRightArrow, LeftRightVector, LeftTee, LeftTeeArrow, LeftTeeVector, LeftTriangle, LeftTriangleBar, LeftTriangleEqual, LeftUpDownVector, LeftUpTeeVector, LeftUpVector, LeftUpVectorBar, LeftVector, LeftVectorBar, Leftarrow, Leftrightarrow, LessEqualGreater, LessFullEqual, LessGreater, LessLess, LessSlantEqual, LessTilde, Lfr, Ll, Lleftarrow, Lmidot, LongLeftArrow, LongLeftRightArrow, LongRightArrow, Longleftarrow, Longleftrightarrow, Longrightarrow, Lopf, LowerLeftArrow, LowerRightArrow, Lscr, Lsh, Lstrok, Lt, Map, Mcy, MediumSpace, Mellintrf, Mfr, MinusPlus, Mopf, Mscr, Mu, NJcy, Nacute, Ncaron, Ncedil, Ncy, NegativeMediumSpace, NegativeThickSpace, NegativeThinSpace, NegativeVeryThinSpace, NestedGreaterGreater, NestedLessLess, NewLine, Nfr, NoBreak, NonBreakingSpace, Nopf, Not, NotCongruent, NotCupCap, NotDoubleVerticalBar, NotElement, NotEqual, NotEqualTilde, NotExists, NotGreater, NotGreaterEqual, NotGreaterFullEqual, NotGreaterGreater, NotGreaterLess, NotGreaterSlantEqual, NotGreaterTilde, NotHumpDownHump, NotHumpEqual, NotLeftTriangle, NotLeftTriangleBar, NotLeftTriangleEqual, NotLess, NotLessEqual, NotLessGreater, NotLessLess, NotLessSlantEqual, NotLessTilde, NotNestedGreaterGreater, NotNestedLessLess, NotPrecedes, NotPrecedesEqual, NotPrecedesSlantEqual, NotReverseElement, NotRightTriangle, NotRightTriangleBar, NotRightTriangleEqual, NotSquareSubset, NotSquareSubsetEqual, NotSquareSuperset, NotSquareSupersetEqual, NotSubset, NotSubsetEqual, NotSucceeds, NotSucceedsEqual, NotSucceedsSlantEqual, NotSucceedsTilde, NotSuperset, NotSupersetEqual, NotTilde, NotTildeEqual, NotTildeFullEqual, NotTildeTilde, NotVerticalBar, Nscr, Ntild, Ntilde, Nu, OElig, Oacut, Oacute, Ocir, Ocirc, Ocy, Odblac, Ofr, Ograv, Ograve, Omacr, Omega, Omicron, Oopf, OpenCurlyDoubleQuote, OpenCurlyQuote, Or, Oscr, Oslas, Oslash, Otild, Otilde, Otimes, Oum, Ouml, OverBar, OverBrace, OverBracket, OverParenthesis, PartialD, Pcy, Pfr, Phi, Pi, PlusMinus, Poincareplane, Popf, Pr, Precedes, PrecedesEqual, PrecedesSlantEqual, PrecedesTilde, Prime, Product, Proportion, Proportional, Pscr, Psi, QUO, QUOT, Qfr, Qopf, Qscr, RBarr, RE, REG, Racute, Rang, Rarr, Rarrtl, Rcaron, Rcedil, Rcy, Re, ReverseElement, ReverseEquilibrium, ReverseUpEquilibrium, Rfr, Rho, RightAngleBracket, RightArrow, RightArrowBar, RightArrowLeftArrow, RightCeiling, RightDoubleBracket, RightDownTeeVector, RightDownVector, RightDownVectorBar, RightFloor, RightTee, RightTeeArrow, RightTeeVector, RightTriangle, RightTriangleBar, RightTriangleEqual, RightUpDownVector, RightUpTeeVector, RightUpVector, RightUpVectorBar, RightVector, RightVectorBar, Rightarrow, Ropf, RoundImplies, Rrightarrow, Rscr, Rsh, RuleDelayed, SHCHcy, SHcy, SOFTcy, Sacute, Sc, Scaron, Scedil, Scirc, Scy, Sfr, ShortDownArrow, ShortLeftArrow, ShortRightArrow, ShortUpArrow, Sigma, SmallCircle, Sopf, Sqrt, Square, SquareIntersection, SquareSubset, SquareSubsetEqual, SquareSuperset, SquareSupersetEqual, SquareUnion, Sscr, Star, Sub, Subset, SubsetEqual, Succeeds, SucceedsEqual, SucceedsSlantEqual, SucceedsTilde, SuchThat, Sum, Sup, Superset, SupersetEqual, Supset, THOR, THORN, TRADE, TSHcy, TScy, Tab, Tau, Tcaron, Tcedil, Tcy, Tfr, Therefore, Theta, ThickSpace, ThinSpace, Tilde, TildeEqual, TildeFullEqual, TildeTilde, Topf, TripleDot, Tscr, Tstrok, Uacut, Uacute, Uarr, Uarrocir, Ubrcy, Ubreve, Ucir, Ucirc, Ucy, Udblac, Ufr, Ugrav, Ugrave, Umacr, UnderBar, UnderBrace, UnderBracket, UnderParenthesis, Union, UnionPlus, Uogon, Uopf, UpArrow, UpArrowBar, UpArrowDownArrow, UpDownArrow, UpEquilibrium, UpTee, UpTeeArrow, Uparrow, Updownarrow, UpperLeftArrow, UpperRightArrow, Upsi, Upsilon, Uring, Uscr, Utilde, Uum, Uuml, VDash, Vbar, Vcy, Vdash, Vdashl, Vee, Verbar, Vert, VerticalBar, VerticalLine, VerticalSeparator, VerticalTilde, VeryThinSpace, Vfr, Vopf, Vscr, Vvdash, Wcirc, Wedge, Wfr, Wopf, Wscr, Xfr, Xi, Xopf, Xscr, YAcy, YIcy, YUcy, Yacut, Yacute, Ycirc, Ycy, Yfr, Yopf, Yscr, Yuml, ZHcy, Zacute, Zcaron, Zcy, Zdot, ZeroWidthSpace, Zeta, Zfr, Zopf, Zscr, aacut, aacute, abreve, ac, acE, acd, acir, acirc, acut, acute, acy, aeli, aelig, af, afr, agrav, agrave, alefsym, aleph, alpha, am, amacr, amalg, amp, and, andand, andd, andslope, andv, ang, ange, angle, angmsd, angmsdaa, angmsdab, angmsdac, angmsdad, angmsdae, angmsdaf, angmsdag, angmsdah, angrt, angrtvb, angrtvbd, angsph, angst, angzarr, aogon, aopf, ap, apE, apacir, ape, apid, apos, approx, approxeq, arin, aring, ascr, ast, asymp, asympeq, atild, atilde, aum, auml, awconint, awint, bNot, backcong, backepsilon, backprime, backsim, backsimeq, barvee, barwed, barwedge, bbrk, bbrktbrk, bcong, bcy, bdquo, becaus, because, bemptyv, bepsi, bernou, beta, beth, between, bfr, bigcap, bigcirc, bigcup, bigodot, bigoplus, bigotimes, bigsqcup, bigstar, bigtriangledown, bigtriangleup, biguplus, bigvee, bigwedge, bkarow, blacklozenge, blacksquare, blacktriangle, blacktriangledown, blacktriangleleft, blacktriangleright, blank, blk12, blk14, blk34, block, bne, bnequiv, bnot, bopf, bot, bottom, bowtie, boxDL, boxDR, boxDl, boxDr, boxH, boxHD, boxHU, boxHd, boxHu, boxUL, boxUR, boxUl, boxUr, boxV, boxVH, boxVL, boxVR, boxVh, boxVl, boxVr, boxbox, boxdL, boxdR, boxdl, boxdr, boxh, boxhD, boxhU, boxhd, boxhu, boxminus, boxplus, boxtimes, boxuL, boxuR, boxul, boxur, boxv, boxvH, boxvL, boxvR, boxvh, boxvl, boxvr, bprime, breve, brvba, brvbar, bscr, bsemi, bsim, bsime, bsol, bsolb, bsolhsub, bull, bullet, bump, bumpE, bumpe, bumpeq, cacute, cap, capand, capbrcup, capcap, capcup, capdot, caps, caret, caron, ccaps, ccaron, ccedi, ccedil, ccirc, ccups, ccupssm, cdot, cedi, cedil, cemptyv, cen, cent, centerdot, cfr, chcy, check, checkmark, chi, cir, cirE, circ, circeq, circlearrowleft, circlearrowright, circledR, circledS, circledast, circledcirc, circleddash, cire, cirfnint, cirmid, cirscir, clubs, clubsuit, colon, colone, coloneq, comma, commat, comp, compfn, complement, complexes, cong, congdot, conint, cop, copf, coprod, copy, copysr, crarr, cross, cscr, csub, csube, csup, csupe, ctdot, cudarrl, cudarrr, cuepr, cuesc, cularr, cularrp, cup, cupbrcap, cupcap, cupcup, cupdot, cupor, cups, curarr, curarrm, curlyeqprec, curlyeqsucc, curlyvee, curlywedge, curre, curren, curvearrowleft, curvearrowright, cuvee, cuwed, cwconint, cwint, cylcty, dArr, dHar, dagger, daleth, darr, dash, dashv, dbkarow, dblac, dcaron, dcy, dd, ddagger, ddarr, ddotseq, de, deg, delta, demptyv, dfisht, dfr, dharl, dharr, diam, diamond, diamondsuit, diams, die, digamma, disin, div, divid, divide, divideontimes, divonx, djcy, dlcorn, dlcrop, dollar, dopf, dot, doteq, doteqdot, dotminus, dotplus, dotsquare, doublebarwedge, downarrow, downdownarrows, downharpoonleft, downharpoonright, drbkarow, drcorn, drcrop, dscr, dscy, dsol, dstrok, dtdot, dtri, dtrif, duarr, duhar, dwangle, dzcy, dzigrarr, eDDot, eDot, eacut, eacute, easter, ecaron, ecir, ecir, ecirc, ecolon, ecy, edot, ee, efDot, efr, eg, egrav, egrave, egs, egsdot, el, elinters, ell, els, elsdot, emacr, empty, emptyset, emptyv, emsp, emsp13, emsp14, eng, ensp, eogon, eopf, epar, eparsl, eplus, epsi, epsilon, epsiv, eqcirc, eqcolon, eqsim, eqslantgtr, eqslantless, equals, equest, equiv, equivDD, eqvparsl, erDot, erarr, escr, esdot, esim, et, eta, eth, eum, euml, euro, excl, exist, expectation, exponentiale, fallingdotseq, fcy, female, ffilig, fflig, ffllig, ffr, filig, fjlig, flat, fllig, fltns, fnof, fopf, forall, fork, forkv, fpartint, frac1, frac1, frac12, frac13, frac14, frac15, frac16, frac18, frac23, frac25, frac3, frac34, frac35, frac38, frac45, frac56, frac58, frac78, frasl, frown, fscr, g, gE, gEl, gacute, gamma, gammad, gap, gbreve, gcirc, gcy, gdot, ge, gel, geq, geqq, geqslant, ges, gescc, gesdot, gesdoto, gesdotol, gesl, gesles, gfr, gg, ggg, gimel, gjcy, gl, glE, gla, glj, gnE, gnap, gnapprox, gne, gneq, gneqq, gnsim, gopf, grave, gscr, gsim, gsime, gsiml, gt, gtcc, gtcir, gtdot, gtlPar, gtquest, gtrapprox, gtrarr, gtrdot, gtreqless, gtreqqless, gtrless, gtrsim, gvertneqq, gvnE, hArr, hairsp, half, hamilt, hardcy, harr, harrcir, harrw, hbar, hcirc, hearts, heartsuit, hellip, hercon, hfr, hksearow, hkswarow, hoarr, homtht, hookleftarrow, hookrightarrow, hopf, horbar, hscr, hslash, hstrok, hybull, hyphen, iacut, iacute, ic, icir, icirc, icy, iecy, iexc, iexcl, iff, ifr, igrav, igrave, ii, iiiint, iiint, iinfin, iiota, ijlig, imacr, image, imagline, imagpart, imath, imof, imped, in, incare, infin, infintie, inodot, int, intcal, integers, intercal, intlarhk, intprod, iocy, iogon, iopf, iota, iprod, iques, iquest, iscr, isin, isinE, isindot, isins, isinsv, isinv, it, itilde, iukcy, ium, iuml, jcirc, jcy, jfr, jmath, jopf, jscr, jsercy, jukcy, kappa, kappav, kcedil, kcy, kfr, kgreen, khcy, kjcy, kopf, kscr, l, lAarr, lArr, lAtail, lBarr, lE, lEg, lHar, lacute, laemptyv, lagran, lambda, lang, langd, langle, lap, laqu, laquo, larr, larrb, larrbfs, larrfs, larrhk, larrlp, larrpl, larrsim, larrtl, lat, latail, late, lates, lbarr, lbbrk, lbrace, lbrack, lbrke, lbrksld, lbrkslu, lcaron, lcedil, lceil, lcub, lcy, ldca, ldquo, ldquor, ldrdhar, ldrushar, ldsh, le, leftarrow, leftarrowtail, leftharpoondown, leftharpoonup, leftleftarrows, leftrightarrow, leftrightarrows, leftrightharpoons, leftrightsquigarrow, leftthreetimes, leg, leq, leqq, leqslant, les, lescc, lesdot, lesdoto, lesdotor, lesg, lesges, lessapprox, lessdot, lesseqgtr, lesseqqgtr, lessgtr, lesssim, lfisht, lfloor, lfr, lg, lgE, lhard, lharu, lharul, lhblk, ljcy, ll, llarr, llcorner, llhard, lltri, lmidot, lmoust, lmoustache, lnE, lnap, lnapprox, lne, lneq, lneqq, lnsim, loang, loarr, lobrk, longleftarrow, longleftrightarrow, longmapsto, longrightarrow, looparrowleft, looparrowright, lopar, lopf, loplus, lotimes, lowast, lowbar, loz, lozenge, lozf, lpar, lparlt, lrarr, lrcorner, lrhar, lrhard, lrm, lrtri, lsaquo, lscr, lsh, lsim, lsime, lsimg, lsqb, lsquo, lsquor, lstrok, lt, ltcc, ltcir, ltdot, lthree, ltimes, ltlarr, ltquest, ltrPar, ltri, ltrie, ltrif, lurdshar, luruhar, lvertneqq, lvnE, mDDot, mac, macr, male, malt, maltese, map, mapsto, mapstodown, mapstoleft, mapstoup, marker, mcomma, mcy, mdash, measuredangle, mfr, mho, micr, micro, mid, midast, midcir, middo, middot, minus, minusb, minusd, minusdu, mlcp, mldr, mnplus, models, mopf, mp, mscr, mstpos, mu, multimap, mumap, nGg, nGt, nGtv, nLeftarrow, nLeftrightarrow, nLl, nLt, nLtv, nRightarrow, nVDash, nVdash, nabla, nacute, nang, nap, napE, napid, napos, napprox, natur, natural, naturals, nbs, nbsp, nbump, nbumpe, ncap, ncaron, ncedil, ncong, ncongdot, ncup, ncy, ndash, ne, neArr, nearhk, nearr, nearrow, nedot, nequiv, nesear, nesim, nexist, nexists, nfr, ngE, nge, ngeq, ngeqq, ngeqslant, nges, ngsim, ngt, ngtr, nhArr, nharr, nhpar, ni, nis, nisd, niv, njcy, nlArr, nlE, nlarr, nldr, nle, nleftarrow, nleftrightarrow, nleq, nleqq, nleqslant, nles, nless, nlsim, nlt, nltri, nltrie, nmid, no, nopf, not, notin, notinE, notindot, notinva, notinvb, notinvc, notni, notniva, notnivb, notnivc, npar, nparallel, nparsl, npart, npolint, npr, nprcue, npre, nprec, npreceq, nrArr, nrarr, nrarrc, nrarrw, nrightarrow, nrtri, nrtrie, nsc, nsccue, nsce, nscr, nshortmid, nshortparallel, nsim, nsime, nsimeq, nsmid, nspar, nsqsube, nsqsupe, nsub, nsubE, nsube, nsubset, nsubseteq, nsubseteqq, nsucc, nsucceq, nsup, nsupE, nsupe, nsupset, nsupseteq, nsupseteqq, ntgl, ntild, ntilde, ntlg, ntriangleleft, ntrianglelefteq, ntriangleright, ntrianglerighteq, nu, num, numero, numsp, nvDash, nvHarr, nvap, nvdash, nvge, nvgt, nvinfin, nvlArr, nvle, nvlt, nvltrie, nvrArr, nvrtrie, nvsim, nwArr, nwarhk, nwarr, nwarrow, nwnear, oS, oacut, oacute, oast, ocir, ocir, ocirc, ocy, odash, odblac, odiv, odot, odsold, oelig, ofcir, ofr, ogon, ograv, ograve, ogt, ohbar, ohm, oint, olarr, olcir, olcross, oline, olt, omacr, omega, omicron, omid, ominus, oopf, opar, operp, oplus, or, orarr, ord, ord, ord, order, orderof, ordf, ordm, origof, oror, orslope, orv, oscr, oslas, oslash, osol, otild, otilde, otimes, otimesas, oum, ouml, ovbar, par, par, para, parallel, parsim, parsl, part, pcy, percnt, period, permil, perp, pertenk, pfr, phi, phiv, phmmat, phone, pi, pitchfork, piv, planck, planckh, plankv, plus, plusacir, plusb, pluscir, plusdo, plusdu, pluse, plusm, plusmn, plussim, plustwo, pm, pointint, popf, poun, pound, pr, prE, prap, prcue, pre, prec, precapprox, preccurlyeq, preceq, precnapprox, precneqq, precnsim, precsim, prime, primes, prnE, prnap, prnsim, prod, profalar, profline, profsurf, prop, propto, prsim, prurel, pscr, psi, puncsp, qfr, qint, qopf, qprime, qscr, quaternions, quatint, quest, questeq, quo, quot, rAarr, rArr, rAtail, rBarr, rHar, race, racute, radic, raemptyv, rang, rangd, range, rangle, raqu, raquo, rarr, rarrap, rarrb, rarrbfs, rarrc, rarrfs, rarrhk, rarrlp, rarrpl, rarrsim, rarrtl, rarrw, ratail, ratio, rationals, rbarr, rbbrk, rbrace, rbrack, rbrke, rbrksld, rbrkslu, rcaron, rcedil, rceil, rcub, rcy, rdca, rdldhar, rdquo, rdquor, rdsh, re, real, realine, realpart, reals, rect, reg, rfisht, rfloor, rfr, rhard, rharu, rharul, rho, rhov, rightarrow, rightarrowtail, rightharpoondown, rightharpoonup, rightleftarrows, rightleftharpoons, rightrightarrows, rightsquigarrow, rightthreetimes, ring, risingdotseq, rlarr, rlhar, rlm, rmoust, rmoustache, rnmid, roang, roarr, robrk, ropar, ropf, roplus, rotimes, rpar, rpargt, rppolint, rrarr, rsaquo, rscr, rsh, rsqb, rsquo, rsquor, rthree, rtimes, rtri, rtrie, rtrif, rtriltri, ruluhar, rx, sacute, sbquo, sc, scE, scap, scaron, sccue, sce, scedil, scirc, scnE, scnap, scnsim, scpolint, scsim, scy, sdot, sdotb, sdote, seArr, searhk, searr, searrow, sec, sect, semi, seswar, setminus, setmn, sext, sfr, sfrown, sh, sharp, shchcy, shcy, shortmid, shortparallel, shy, sigma, sigmaf, sigmav, sim, simdot, sime, simeq, simg, simgE, siml, simlE, simne, simplus, simrarr, slarr, smallsetminus, smashp, smeparsl, smid, smile, smt, smte, smtes, softcy, sol, solb, solbar, sopf, spades, spadesuit, spar, sqcap, sqcaps, sqcup, sqcups, sqsub, sqsube, sqsubset, sqsubseteq, sqsup, sqsupe, sqsupset, sqsupseteq, squ, square, squarf, squf, srarr, sscr, ssetmn, ssmile, sstarf, star, starf, straightepsilon, straightphi, strns, sub, subE, subdot, sube, subedot, submult, subnE, subne, subplus, subrarr, subset, subseteq, subseteqq, subsetneq, subsetneqq, subsim, subsub, subsup, succ, succapprox, succcurlyeq, succeq, succnapprox, succneqq, succnsim, succsim, sum, sung, sup, sup, sup, sup, sup1, sup2, sup3, supE, supdot, supdsub, supe, supedot, suphsol, suphsub, suplarr, supmult, supnE, supne, supplus, supset, supseteq, supseteqq, supsetneq, supsetneqq, supsim, supsub, supsup, swArr, swarhk, swarr, swarrow, swnwar, szli, szlig, target, tau, tbrk, tcaron, tcedil, tcy, tdot, telrec, tfr, there4, therefore, theta, thetasym, thetav, thickapprox, thicksim, thinsp, thkap, thksim, thor, thorn, tilde, time, times, timesb, timesbar, timesd, tint, toea, top, topbot, topcir, topf, topfork, tosa, tprime, trade, triangle, triangledown, triangleleft, trianglelefteq, triangleq, triangleright, trianglerighteq, tridot, trie, triminus, triplus, trisb, tritime, trpezium, tscr, tscy, tshcy, tstrok, twixt, twoheadleftarrow, twoheadrightarrow, uArr, uHar, uacut, uacute, uarr, ubrcy, ubreve, ucir, ucirc, ucy, udarr, udblac, udhar, ufisht, ufr, ugrav, ugrave, uharl, uharr, uhblk, ulcorn, ulcorner, ulcrop, ultri, um, umacr, uml, uogon, uopf, uparrow, updownarrow, upharpoonleft, upharpoonright, uplus, upsi, upsih, upsilon, upuparrows, urcorn, urcorner, urcrop, uring, urtri, uscr, utdot, utilde, utri, utrif, uuarr, uum, uuml, uwangle, vArr, vBar, vBarv, vDash, vangrt, varepsilon, varkappa, varnothing, varphi, varpi, varpropto, varr, varrho, varsigma, varsubsetneq, varsubsetneqq, varsupsetneq, varsupsetneqq, vartheta, vartriangleleft, vartriangleright, vcy, vdash, vee, veebar, veeeq, vellip, verbar, vert, vfr, vltri, vnsub, vnsup, vopf, vprop, vrtri, vscr, vsubnE, vsubne, vsupnE, vsupne, vzigzag, wcirc, wedbar, wedge, wedgeq, weierp, wfr, wopf, wp, wr, wreath, wscr, xcap, xcirc, xcup, xdtri, xfr, xhArr, xharr, xi, xlArr, xlarr, xmap, xnis, xodot, xopf, xoplus, xotime, xrArr, xrarr, xscr, xsqcup, xuplus, xutri, xvee, xwedge, yacut, yacute, yacy, ycirc, ycy, ye, yen, yfr, yicy, yopf, yscr, yucy, yum, yuml, zacute, zcaron, zcy, zdot, zeetrf, zeta, zfr, zhcy, zigrarr, zopf, zscr, zwj, or zwnj.

13 References

14 Acknowledgments

Thanks to John Gruber for inventing Markdown.

Thanks to John MacFarlane for defining CommonMark.

Thanks to ZEIT, Inc., Gatsby, Inc., Netlify, Inc., Holloway, Inc., and the many organizations and individuals for financial support through OpenCollective

15 License

Copyright © 2019 Titus Wormer. This work is licensed under a Creative Commons Attribution 4.0 International License.