syntax-tree / mdast-util-to-markdown

mdast utility to serialize markdown
http://unifiedjs.com
MIT License
98 stars 20 forks source link
compile markdown mdast mdast-util serialize stringify unist

mdast-util-to-markdown

Build Coverage Downloads Size Sponsors Backers Chat

mdast utility that turns a syntax tree into markdown.

Contents

What is this?

This package is a utility that takes an mdast syntax tree as input and turns it into serialized markdown.

This utility is a low level project. It’s used in remark-stringify, which focusses on making it easier to transform content by abstracting these internals away.

When should I use this?

If you want to handle syntax trees manually, use this. For an easier time processing content, use the remark ecosystem instead.

You can combine this utility with other utilities to add syntax extensions. Notable examples that deeply integrate with it are mdast-util-gfm, mdast-util-mdx, mdast-util-frontmatter, mdast-util-math, and mdast-util-directive.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install mdast-util-to-markdown

In Deno with esm.sh:

import {toMarkdown} from 'https://esm.sh/mdast-util-to-markdown@2'

In browsers with esm.sh:

<script type="module">
  import {toMarkdown} from 'https://esm.sh/mdast-util-to-markdown@2?bundle'
</script>

Use

Say our module example.js looks as follows:

/**
 * @import {Root} from 'mdast'
 */

import {toMarkdown} from 'mdast-util-to-markdown'

/** @type {Root} */
const tree = {
  type: 'root',
  children: [
    {
      type: 'blockquote',
      children: [
        {type: 'thematicBreak'},
        {
          type: 'paragraph',
          children: [
            {type: 'text', value: '- a\nb !'},
            {
              type: 'link',
              url: 'example.com',
              children: [{type: 'text', value: 'd'}]
            }
          ]
        }
      ]
    }
  ]
}

console.log(toMarkdown(tree))

…now running node example.js yields:

> ***
>
> \- a
> b \![d](example.com)

πŸ‘‰ Note: observe the properly escaped characters which would otherwise turn into a list and image respectively.

API

This package exports the identifiers defaultHandlers and toMarkdown. There is no default export.

toMarkdown(tree[, options])

Turn an mdast syntax tree into markdown.

Parameters
Returns

Serialized markdown representing tree (string).

defaultHandlers

Default (CommonMark) handlers (Handlers).

ConstructName

Construct names for things generated by mdast-util-to-markdown (TypeScript type).

This is an enum of strings, each being a semantic label, useful to know when serializing whether we’re for example in a double (") or single (') quoted title.

Type
type ConstructName = ConstructNameMap[keyof ConstructNameMap]

ConstructNameMap

Interface of registered constructs (TypeScript type).

Type
interface ConstructNameMap { /* see code */ }

When working on extensions that use new constructs, extend the corresponding interface to register its name:

declare module 'mdast-util-to-markdown' {
  interface ConstructNameMap {
    // Register a new construct name (value is used, key should match it).
    gfmStrikethrough: 'gfmStrikethrough'
  }
}

Handle

Handle a particular node (TypeScript type).

Parameters
Returns

Serialized markdown representing node (string).

Handlers

Handle particular nodes (TypeScript type).

Each key is a node type (Node['type']), each value its corresponding handler (Handle).

Type
type Handlers = Record<Node['type'], Handle>

Info

Info on the surrounding of the node that is serialized (TypeScript type).

Fields

Join

How to join two blocks (TypeScript type).

β€œBlocks” are typically joined by one blank line. Sometimes it’s nicer to have them flush next to each other, yet other times they cannot occur together at all.

Join functions receive two adjacent siblings and their parent and what they return defines how many blank lines to use between them.

Parameters
Returns

How many blank lines to use between the siblings (boolean, number, optional).

Where true is as passing 1 and false means the nodes cannot be joined by a blank line, such as two adjacent block quotes or indented code after a list, in which case a comment will be injected to break them up:

> Quote 1

<!---->

> Quote 2

πŸ‘‰ Note: abusing this feature will break markdown. One such example is when returning 0 for two paragraphs, which will result in the text running together, and in the future to be seen as one paragraph.

Map

Map function to pad a single line (TypeScript type).

Parameters
Returns

Padded line (string).

Options

Configuration (TypeScript type).

Fields

The following fields influence how markdown is serialized.

options.bullet

Marker to use for bullets of items in unordered lists ('*', '+', or '-', default: '*').

There are three cases where the primary bullet cannot be used:

options.bulletOther

Marker to use in certain cases where the primary bullet doesn’t work ('*', '+', or '-', default: '-' when bullet is '*', '*' otherwise).

Cannot be equal to bullet.

options.bulletOrdered

Marker to use for bullets of items in ordered lists ('.' or ')', default: '.').

There is one case where the primary bullet for ordered items cannot be used:

options.closeAtx

Whether to add the same number of number signs (#) at the end of an ATX heading as the opening sequence (boolean, default: false).

options.emphasis

Marker to use for emphasis ('*' or '_', default: '*').

options.fence

Marker to use for fenced code ('`' or '~', default: '`').

options.fences

Whether to use fenced code always (boolean, default: true). The default is to use fenced code if there is a language defined, if the code is empty, or if it starts or ends in blank lines.

options.incrementListMarker

Whether to increment the counter of ordered lists items (boolean, default: true).

options.listItemIndent

How to indent the content of list items ('mixed', 'one', or 'tab', default: 'one'). Either with the size of the bullet plus one space (when 'one'), a tab stop ('tab'), or depending on the item and its parent list ('mixed', uses 'one' if the item and list are tight and 'tab' otherwise).

options.quote

Marker to use for titles ('"' or "'", default: '"').

options.resourceLink

Whether to always use resource links (boolean, default: false). The default is to use autolinks (<https://example.com>) when possible and resource links ([text](url)) otherwise.

options.rule

Marker to use for thematic breaks ('*', '-', or '_', default: '*').

options.ruleRepetition

Number of markers to use for thematic breaks (number, default: 3, min: 3).

options.ruleSpaces

Whether to add spaces between markers in thematic breaks (boolean, default: false).

options.setext

Whether to use setext headings when possible (boolean, default: false). The default is to always use ATX headings (# heading) instead of setext headings (heading\n=======). Setext headings cannot be used for empty headings or headings with a rank of three or more.

options.strong

Marker to use for strong ('*' or '_', default: '*').

options.tightDefinitions

Whether to join definitions without a blank line (boolean, default: false).

The default is to add blank lines between any flow (β€œblock”) construct. Turning this option on is a shortcut for a Join function like so:

function joinTightDefinitions(left, right) {
  if (left.type === 'definition' && right.type === 'definition') {
    return 0
  }
}
options.handlers

Handle particular nodes (Handlers, optional).

options.join

How to join blocks (Array<Join>, optional).

options.unsafe

Schemas that define when characters cannot occur (Array<Unsafe>, optional).

options.extensions

List of extensions (Array<Options>, default: []). Each extension is an object with the same interface as Options itself.

SafeConfig

Configuration passed to state.safe (TypeScript type).

Fields

State

Info passed around about the current state (TypeScript type).

Fields

Tracker

Track positional info in the output (TypeScript type).

This info isn’t used yet but such functionality will allow line wrapping, source maps, etc.

Fields

Unsafe

Schema that defines when a character cannot occur (TypeScript type).

Fields

List of extensions

Syntax

Markdown is serialized according to CommonMark but care is taken to format in such a way that the resulting markdown should work with most markdown parsers. Extensions can add support for custom syntax.

Syntax tree

The syntax tree is mdast.

Types

This package is fully typed with TypeScript. It exports the additional types ConstructName, ConstructNameMap, Handle, Handlers, Info, Join, Map, Options, SafeConfig, State, and Unsafe.

Compatibility

Projects maintained by the unified collective are compatible with maintained versions of Node.js.

When we cut a new major release, we drop support for unmaintained versions of Node. This means we try to keep the current release line, mdast-util-to-markdown@^2, compatible with Node.js 16.

Security

mdast-util-to-markdown will do its best to serialize markdown to match the syntax tree, but there are several cases where that is impossible. It’ll do its best, but complete roundtripping is impossible given that any value could be injected into the tree.

As markdown is sometimes used for HTML, and improper use of HTML can open you up to a cross-site scripting (XSS) attack, use of mdast-util-to-markdown and parsing it again later could potentially be unsafe. When parsing markdown afterwards and then going to HTML, use something like hast-util-sanitize to make the tree safe.

Related

Contribute

See contributing.md in syntax-tree/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT Β© Titus Wormer