remarkjs / react-markdown

Markdown component for React
https://remarkjs.github.io/react-markdown/
MIT License
13.26k stars 876 forks source link

Invalid HTML syntax together with `rehype-raw` causes crash #833

Closed EvHaus closed 5 months ago

EvHaus commented 5 months ago

Initial checklist

Affected packages and versions

react-markdown@9.0.1 + rehype-raw@7.0.0

Link to runnable example

https://stackblitz.com/edit/github-nbnqac?file=src%2Fapp.tsx

Steps to reproduce

When react-markdown is used together with rehype-raw and invalid HTML is provided inside the markdown content, it causes a crash with:

TypeError: Cannot read properties of null (reading 'tagName')
    at Tokenizer._stateTagName (index.js:1095:1)
    at Tokenizer._callState (index.js:586:1)
    at Tokenizer._runParsingLoop (index.js:223:1)
    at Tokenizer.write (index.js:248:1)
    at handleRaw (index.js:349:1)
    at one (index.js:108:1)
    at Object.handle (index.js:100:1)
    at all (index.js:156:1)
    at root (index.js:172:1)
    at one (index.js:108:1)

See Stackblitz link for a repro, or use this code:

import Markdown from 'react-markdown';
import rehypeRaw from 'rehype-raw';

const markdownSource = `
Note how this markdown has invalid HTML below. The closing "summary" tag is missing a closing angle bracket.

<details>
<summary>Complete logs</summary
</details>
`;

function App() {
  return <Markdown rehypePlugins={[[rehypeRaw]]}>{markdownSource}</Markdown>;
}

export default App;

Expected behavior

No crash. Ideally rehype-raw could be smart enough to auto-close the tag (like GitHub's markdown editor does), but if that's too difficult, simply not rendering that element is fine too.

Actual behavior

Code crashes with the error above.

Runtime

Node v17

Package manager

npm 8

OS

Linux

Build and bundle tools

Webpack

wooorm commented 5 months ago

Can also be reproduced by pasting that code at https://remarkjs.github.io/react-markdown/

wooorm commented 5 months ago

Can also be reproduced with just the utilities:

import assert from 'node:assert/strict'
import {fromMarkdown} from 'mdast-util-from-markdown'
import {toHast} from 'mdast-util-to-hast'
import {raw} from 'hast-util-raw'
import {toHtml} from 'hast-util-to-html'

const document = `
Note how this markdown has invalid HTML below. The closing "summary" tag is missing a closing angle bracket.

<details>
<summary>Complete logs</summary

</details>
`

const mdast = fromMarkdown(document)
const hast = toHast(mdast, {allowDangerousHtml: true})
assert(hast.type === 'root')

const reformatted = raw(hast)

console.log(reformatted)
console.log(toHtml(reformatted))
file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:1095
                token.tagName += String.fromCodePoint(isAsciiUpper(cp) ? toAsciiLower(cp) : cp);
                ^

TypeError: Cannot read properties of null (reading 'tagName')
    at Tokenizer._stateTagName (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:1095:17)
    at Tokenizer._callState (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:586:22)
    at Tokenizer._runParsingLoop (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:223:22)
    at Tokenizer.write (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/parse5/dist/tokenizer/index.js:248:14)
    at handleRaw (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:345:26)
    at one (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/zwitch/index.js:108:17)
    at Object.handle (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:96:7)
    at all (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:152:13)
    at root (file:///Users/tilde/Projects/oss/hast-util-raw/lib/index.js:168:3)
    at one (file:///Users/tilde/Projects/oss/hast-util-raw/node_modules/zwitch/index.js:108:17)
wooorm commented 5 months ago

Even smaller, and making the exception disappear:

/**
 * @typedef {import('hast').Root} Root
 */

import {raw} from 'hast-util-raw'
import {toHtml} from 'hast-util-to-html'

/** @type {Root} */
const hast = {
  type: 'root',
  children: [
    {type: 'raw', value: '<details>\n<summary>Complete logs</summary'},
    {type: 'text', value: '\n'},
    {type: 'raw', value: '</details>'}
  ]
}

const reformatted = raw(hast)

console.log(reformatted)
console.log(toHtml(reformatted))

Removing the {type: 'text', value: '\n'}, makes the exception disappear.

We intentionally do want to support a {type: 'raw', value: '<sum'}, {type: 'raw', value: 'mary>'} to work. So we have to allow “partial”/“broken” HTML. But, when a text node appears, we can perhaps reset that.

github-actions[bot] commented 5 months ago

Hi! This was closed. Team: If this was fixed, please add phase/solved. Otherwise, please add one of the no/* labels.

wooorm commented 5 months ago

released!

EvHaus commented 5 months ago

Awesome thanks. Confirmed that hast-util-raw@9.0.4 fixes the issue.