remarkjs / remark-rehype

plugin that turns markdown into HTML to support rehype
https://remark.js.org
MIT License
275 stars 18 forks source link
hast html markdown mdast rehype remark

remark-rehype

Build Coverage Downloads Size Sponsors Backers Chat

remark plugin that turns markdown into HTML to support rehype.

Contents

What is this?

This package is a unified (remark) plugin that switches from remark (the markdown ecosystem) to rehype (the HTML ecosystem). It does this by transforming the current markdown (mdast) syntax tree into an HTML (hast) syntax tree. remark plugins deal with mdast and rehype plugins deal with hast, so plugins used after remark-rehype have to be rehype plugins.

The reason that there are different ecosystems for markdown and HTML is that turning markdown into HTML is, while frequently needed, not the only purpose of markdown. Checking (linting) and formatting markdown are also common use cases for remark and markdown. There are several aspects of markdown that do not translate 1-to-1 to HTML. In some cases markdown contains more information than HTML: for example, there are several ways to add a link in markdown (as in, autolinks: <https://url>, resource links: [label](url), and reference links with definitions: [label][id] and [id]: url). In other cases HTML contains more information than markdown: there are many tags, which add new meaning (semantics), available in HTML that aren’t available in markdown. If there was just one AST, it would be quite hard to perform the tasks that several remark and rehype plugins currently do.

When should I use this?

This project is useful when you want to turn markdown to HTML. It opens up a whole new ecosystem with tons of plugins to do all kinds of things. You can minify HTML, format HTML, make sure it’s safe, highlight code, add metadata, and a lot more.

A different plugin, rehype-raw, adds support for raw HTML written inside markdown. This is a separate plugin because supporting HTML inside markdown is a heavy task (performance and bundle size) and not always needed. To use both together, you also have to configure remark-rehype with allowDangerousHtml: true and then use rehype-raw.

The rehype plugin rehype-remark does the inverse of this plugin. It turns HTML into markdown.

If you don’t use plugins and want to access syntax trees, you can use mdast-util-to-hast.

Install

This package is ESM only. In Node.js (version 16+), install with npm:

npm install remark-rehype

In Deno with esm.sh:

import remarkRehype from 'https://esm.sh/remark-rehype@11'

In browsers with esm.sh:

<script type="module">
  import remarkRehype from 'https://esm.sh/remark-rehype@11?bundle'
</script>

Use

Say our document example.md contains:

# Pluto

**Pluto** (minor-planet designation: **134340 Pluto**) is a
[dwarf planet](https://en.wikipedia.org/wiki/Dwarf_planet) in the
[Kuiper belt](https://en.wikipedia.org/wiki/Kuiper_belt).

…and our module example.js contains:

import rehypeDocument from 'rehype-document'
import rehypeFormat from 'rehype-format'
import rehypeStringify from 'rehype-stringify'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import {read} from 'to-vfile'
import {unified} from 'unified'
import {reporter} from 'vfile-reporter'

const file = await unified()
  .use(remarkParse)
  .use(remarkRehype)
  .use(rehypeDocument)
  .use(rehypeFormat)
  .use(rehypeStringify)
  .process(await read('example.md'))

console.error(reporter(file))
console.log(String(file))

…then running node example.js yields:

example.md: no issues found
<!doctype html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>example</title>
    <meta content="width=device-width, initial-scale=1" name="viewport">
  </head>
  <body>
    <h1>Pluto</h1>
    <p>
      <strong>Pluto</strong> (minor-planet designation: <strong>134340 Pluto</strong>) is a
      <a href="https://en.wikipedia.org/wiki/Dwarf_planet">dwarf planet</a> in the
      <a href="https://en.wikipedia.org/wiki/Kuiper_belt">Kuiper belt</a>.
    </p>
  </body>
</html>

API

This package exports the identifiers defaultFootnoteBackContent, defaultFootnoteBackLabel, and defaultHandlers. The default export is remarkRehype.

defaultFootnoteBackContent(referenceIndex, rereferenceIndex)

See defaultFootnoteBackContent from mdast-util-to-hast

defaultFootnoteBackLabel(referenceIndex, rereferenceIndex)

See defaultFootnoteBackLabel from mdast-util-to-hast

defaultHandlers

See defaultHandlers from mdast-util-to-hast

unified().use(remarkRehype[, destination][, options])

Turn markdown into HTML.

Parameters
Returns

Transform (Transformer).

Notes
Signature

👉 Note: It’s highly unlikely that you want to pass a processor.

HTML

Raw HTML is available in mdast as html nodes and can be embedded in hast as semistandard raw nodes. Most plugins ignore raw nodes but two notable ones don’t:

Footnotes

Many options supported here relate to footnotes. Footnotes are not specified by CommonMark, which we follow by default. They are supported by GitHub, so footnotes can be enabled in markdown with remark-gfm.

The options footnoteBackLabel and footnoteLabel define natural language that explains footnotes, which is hidden for sighted users but shown to assistive technology. When your page is not in English, you must define translated values.

Back references use ARIA attributes, but the section label itself uses a heading that is hidden with an sr-only class. To show it to sighted users, define different attributes in footnoteLabelProperties.

Clobbering

Footnotes introduces a problem, as it links footnote calls to footnote definitions on the page through id attributes generated from user content, which results in DOM clobbering.

DOM clobbering is this:

<p id=x></p>
<script>alert(x) // `x` now refers to the DOM `p#x` element</script>

Elements by their ID are made available by browsers on the window object, which is a security risk. Using a prefix solves this problem.

More information on how to handle clobbering and the prefix is explained in Example: headings (DOM clobbering) in rehype-sanitize.

Unknown nodes

Unknown nodes are nodes with a type that isn’t in handlers or passThrough. The default behavior for unknown nodes is:

This behavior can be changed by passing an unknownHandler.

Options

Configuration (TypeScript type).

Fields

Examples

Example: supporting HTML in markdown naïvely

If you completely trust the authors of the input markdown and want to allow them to write HTML inside markdown, you can pass allowDangerousHtml to remark-rehype and rehype-stringify:

import rehypeStringify from 'rehype-stringify'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import {unified} from 'unified'

const file = await unified()
  .use(remarkParse)
  .use(remarkRehype, {allowDangerousHtml: true})
  .use(rehypeStringify, {allowDangerousHtml: true})
  .process('<a href="https://github.com/remarkjs/remark-rehype/blob/main/wiki/Dysnomia_(moon)" onclick="alert(1)">Dysnomia</a>')

console.log(String(file))

Yields:

<p><a href="https://github.com/remarkjs/remark-rehype/blob/main/wiki/Dysnomia_(moon)" onclick="alert(1)">Dysnomia</a></p>

⚠️ Danger: observe that the XSS attack through onclick is present.

Example: supporting HTML in markdown properly

If you do not trust the authors of the input markdown, or if you want to make sure that rehype plugins can see HTML embedded in markdown, use rehype-raw. The following example passes allowDangerousHtml to remark-rehype, then turns the raw embedded HTML into proper HTML nodes with rehype-raw, and finally sanitizes the HTML by only allowing safe things with rehype-sanitize:

import rehypeSanitize from 'rehype-sanitize'
import rehypeStringify from 'rehype-stringify'
import rehypeRaw from 'rehype-raw'
import remarkParse from 'remark-parse'
import remarkRehype from 'remark-rehype'
import {unified} from 'unified'

const file = await unified()
  .use(remarkParse)
  .use(remarkRehype, {allowDangerousHtml: true})
  .use(rehypeRaw)
  .use(rehypeSanitize)
  .use(rehypeStringify)
  .process('<a href="https://github.com/remarkjs/remark-rehype/blob/main/wiki/Dysnomia_(moon)" onclick="alert(1)">Dysnomia</a>')

console.log(String(file))

Running that code yields:

<p><a href="https://github.com/remarkjs/remark-rehype/blob/main/wiki/Dysnomia_(moon)">Dysnomia</a></p>

⚠️ Danger: observe that the XSS attack through onclick is not present.

Example: footnotes in languages other than English

If you know that the markdown is authored in a language other than English, and you’re using remark-gfm to match how GitHub renders markdown, and you know that footnotes are (or can?) be used, you should translate the labels associated with them.

Let’s first set the stage:

import {unified} from 'unified'
import remarkParse from 'remark-parse'
import remarkGfm from 'remark-gfm'
import remarkRehype from 'remark-rehype'
import rehypeStringify from 'rehype-stringify'

const doc = `
Ceres ist nach der römischen Göttin des Ackerbaus benannt;
ihr astronomisches Symbol ist daher eine stilisierte Sichel: ⚳.[^nasa-2015]

[^nasa-2015]: JPL/NASA:
    [*What is a Dwarf Planet?*](https://www.jpl.nasa.gov/infographics/what-is-a-dwarf-planet)
    In: Jet Propulsion Laboratory.
    22. April 2015,
    abgerufen am 19. Januar 2022 (englisch).
`

const file = await unified()
  .use(remarkParse)
  .use(remarkGfm)
  .use(remarkRehype)
  .use(rehypeStringify)
  .process(doc)

console.log(String(file))

Yields:

<p>Ceres ist nach der römischen Göttin des Ackerbaus benannt;
ihr astronomisches Symbol ist daher eine stilisierte Sichel: ⚳.<sup><a href="#user-content-fn-nasa-2015" id="user-content-fnref-nasa-2015" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
<ol>
<li id="user-content-fn-nasa-2015">
<p>JPL/NASA:
<a href="https://www.jpl.nasa.gov/infographics/what-is-a-dwarf-planet"><em>What is a Dwarf Planet?</em></a>
In: Jet Propulsion Laboratory.
22. April 2015,
abgerufen am 19. Januar 2022 (englisch). <a href="#user-content-fnref-nasa-2015" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
</li>
</ol>
</section>

This is a mix of English and German that isn’t very accessible, such as that screen readers can’t handle it nicely. Let’s say our program does know that the markdown is in German. In that case, it’s important to translate and define the labels relating to footnotes so that screen reader users can properly pronounce the page:

@@ -18,7 +18,16 @@ ihr astronomisches Symbol ist daher eine stilisierte Sichel: ⚳.[^nasa-2015]
 const file = await unified()
   .use(remarkParse)
   .use(remarkGfm)
-  .use(remarkRehype)
+  .use(remarkRehype, {
+    footnoteBackLabel(referenceIndex, rereferenceIndex) {
+      return (
+        'Hochspringen nach: ' +
+        (referenceIndex + 1) +
+        (rereferenceIndex > 1 ? '-' + rereferenceIndex : '')
+      )
+    },
+    footnoteLabel: 'Fußnoten'
+  })
   .use(rehypeStringify)
   .process(doc)

Running the code with the above patch applied, yields:

@@ -1,13 +1,13 @@
 <p>Ceres ist nach der römischen Göttin des Ackerbaus benannt;
 ihr astronomisches Symbol ist daher eine stilisierte Sichel: ⚳.<sup><a href="#user-content-fn-nasa-2015" id="user-content-fnref-nasa-2015" data-footnote-ref aria-describedby="footnote-label">1</a></sup></p>
-<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Footnotes</h2>
+<section data-footnotes class="footnotes"><h2 class="sr-only" id="footnote-label">Fußnoten</h2>
 <ol>
 <li id="user-content-fn-nasa-2015">
 <p>JPL/NASA:
 <a href="https://www.jpl.nasa.gov/infographics/what-is-a-dwarf-planet"><em>What is a Dwarf Planet?</em></a>
 In: Jet Propulsion Laboratory.
 22. April 2015,
-abgerufen am 19. Januar 2022 (englisch). <a href="#user-content-fnref-nasa-2015" data-footnote-backref="" aria-label="Back to reference 1" class="data-footnote-backref">↩</a></p>
+abgerufen am 19. Januar 2022 (englisch). <a href="#user-content-fnref-nasa-2015" data-footnote-backref="" aria-label="Hochspringen nach: 1" class="data-footnote-backref">↩</a></p>
 </li>
 </ol>
 </section>

HTML

See Algorithm in mdast-util-to-hast for info on how mdast (markdown) nodes are transformed to hast (HTML).

CSS

Assuming you know how to use (semantic) HTML and CSS, then it should generally be straightforward to style the HTML produced by this plugin. With CSS, you can get creative and style the results as you please.

Some semistandard features, notably GFMs tasklists and footnotes, generate HTML that be unintuitive, as it matches exactly what GitHub produces for their website. There is a project, sindresorhus/github-markdown-css, that exposes the stylesheet that GitHub uses for rendered markdown, which might either be inspirational for more complex features, or can be used as-is to exactly match how GitHub styles rendered markdown.

The following CSS is needed to make footnotes look a bit like GitHub:

/* Style the footnotes section. */
.footnotes {
  font-size: smaller;
  color: #8b949e;
  border-top: 1px solid #30363d;
}

/* Hide the section label for visual users. */
.sr-only {
  position: absolute;
  width: 1px;
  height: 1px;
  padding: 0;
  overflow: hidden;
  clip: rect(0, 0, 0, 0);
  word-wrap: normal;
  border: 0;
}

/* Place `[` and `]` around footnote calls. */
[data-footnote-ref]::before {
  content: '[';
}

[data-footnote-ref]::after {
  content: ']';
}

Syntax tree

This projects turns mdast (markdown) into hast (HTML).

It extends mdast by supporting data fields on mdast nodes to specify how they should be transformed. See Fields on nodes in mdast-util-to-hast for info on how these fields work.

It extends hast by using a semistandard raw nodes for raw HTML. See the HTML note above for more info.

Types

This package is fully typed with TypeScript. It exports the types Options.

The types of mdast-util-to-hast can be referenced to register data fields with @types/mdast and Raw nodes with @types/hast.

/**
 * @import {Root as HastRoot} from 'hast'
 * @import {Root as MdastRoot} from 'mdast'
 * @import {} from 'mdast-util-to-hast'
 */

import {visit} from 'unist-util-visit'

const mdastNode = /** @type {MdastRoot} */ ({/* … */})
console.log(mdastNode.data?.hName) // Typed as `string | undefined`.

const hastNode = /** @type {HastRoot} */ ({/* … */})

visit(hastNode, function (node) {
  // `node` can now be `raw`.
})

Compatibility

Projects maintained by the unified collective are compatible with maintained versions of Node.js.

When we cut a new major release, we drop support for unmaintained versions of Node. This means we try to keep the current release line, remark-rehype@^11, compatible with Node.js 16.

This plugin works with unified version 6+, remark-parse version 3+ (used in remark version 7), and rehype-stringify version 3+ (used in rehype version 5).

Security

Use of remark-rehype can open you up to a cross-site scripting (XSS) attack. Embedded hast properties (hName, hProperties, hChildren) in mdast, custom handlers, and the allowDangerousHtml option all provide openings. Use rehype-sanitize to make the tree safe.

Related

Contribute

See contributing.md in remarkjs/.github for ways to get started. See support.md for ways to get help.

This project has a code of conduct. By interacting with this repository, organization, or community you agree to abide by its terms.

License

MIT © Titus Wormer