rehypejs / rehype-raw

plugin to parse the tree again
https://unifiedjs.com
MIT License
141 stars 9 forks source link

InvalidCharacterError: Failed to execute 'createElement' #34

Open bradmsmith opened 1 day ago

bradmsmith commented 1 day ago

Initial checklist

Affected packages and versions

Latest

Link to runnable example

No response

Steps to reproduce

  1. Visit demo site: https://remarkjs.github.io/react-markdown/
  2. Delete demo text and replace with </p><</div>/span>/div><button&#160;onClick=alert(1)>
  3. Select the Use rehype-raw checkbox
  4. View error in Developer Tools Javascript Console

Expected behavior

Properly filter out malformed HTML tags and ASCII encoding of space (&#160;)

Actual behavior

Attempts to create a HTML element named <button&>

Note: if this is a bug in a dependency, I would appreciate if you could direct me to proper place to report issue. Posting it here because it only occurs when the rehypeRaw plugin is enabled for react-markdown.

Runtime

No response

Package manager

No response

OS

macOS

Build and bundle tools

No response

wooorm commented 16 hours ago

Hi!

malformed HTML tags

Important to note is that there is no “malformed” HTML. Every case of the HTML parser is defined. Every character does something. All these characters in your input have a meaning.

ASCII encoding of space (&#160;)

The term for these things is character references in HTML. The decimal code 160 is hexadecimal A0. Which is non-breaking space. Not a regular space.

However, in that place in HTML tags, character references are not supported. So this is not a non-breaking space. This is actually those characters.


When running into a problem, when raising an issue, I’d always recommend trying to make the reproduction smaller. Here’s what I came up with: <p><button&#160;a=b>.

Now. When loading that in a browser (document.body.innerHTML = '<p><button&#160;a=b>'), you will get this DOM:

<p>
  <button&#160;a=b></button&#160;a=b>
</p>

When creating that element with the DOM, you’d get:

document.createElement('button&#160;a=b')
// [Error: InvalidCharacterError: The string contains invalid characters.]

So. The error you get now with rehype-raw and React is exactly what happens when you do this yourself with the DOM.

I don’t think there’s an alternative.

I think the current behavior is good.