remarkjs / remark-react

Deprecated plugin to transform to React — please use `remark-rehype` and `rehype-react` instead
524 stars 37 forks source link

allowDangerousHTML doesn't work #63

Closed Hamms closed 5 years ago

Hamms commented 5 years ago

It looks like it's not possible to get remark-react to render raw html, even when passing the allowDangerousHTML flag through to mdast-util-to-hast.

For example, attempting to add this unit test fails:

diff --git a/test/index.js b/test/index.js
index d6cb228..22d347e 100644
--- a/test/index.js
+++ b/test/index.js
@@ -121,6 +121,19 @@ versions.forEach(function(reactVersion) {
       'passes toHast options to inner toHAST() function'
     )

+    t.equal(
+      React.renderToStaticMarkup(
+        remark()
+          .use(reactRenderer, {
+            createElement: React.createElement,
+            toHast: {allowDangerousHTML: true}
+          })
+          .processSync('<strong>raw</strong> html').contents
+      ),
+      '<p><strong>raw</strong> html</p>',
+      'renders raw html when specified'
+    )
+
     fixtures.forEach(function(name) {
       var base = path.join(root, name)
       var input = fs.readFileSync(path.join(base, 'input.md'))

with the error:

not ok 6 renders raw html when specified
  ---
    operator: equal
    expected: '<p><strong>raw</strong> html</p>'
    actual:   '<p>raw html</p>'

Is there a different expected configuration to be able to support raw HTML inputs?

ChristianMurphy commented 5 years ago

@Hamms what I've been using is the toHast option https://github.com/remarkjs/remark-react#optionstohast

With:

// allow inline html to be rendered as React VDOM
import rehype from 'rehype-dom-parse';

// ....
     .use(remarkReact, {
          toHast: {
            handlers: {
              html: (h, node) =>
                // process raw HTML text into HAST so react remark can process it
                unified()
                  .use(rehype, { fragment: true })
                  .parse(node.value).children
            }
          }
        }
      )
wooorm commented 5 years ago

Hmm, I believe you could also do what @ChristianMurphy suggests, but with a dangerouslySetInnerHTML: {__html: '...'} prop?

Maybe we need a top-level allowDangerousHTML option to do this by default?

Hamms commented 5 years ago

It's not clear to me why the allowDangerousHTML option I'm currently passing via toHast isn't working. Is it supposed to?

Hamms commented 5 years ago

Looking closer, this doesn't appear to be an issue with toHast at all, but rather with both the sanitization step and the hast-to-hyperscript step, both of which strip out the raw nodes generated by mdast-util-to-hast.

Taking a step back, it's not clear to me what the intent of this library is. For my purposes, I'd like to be able to render markdown with raw html inside, but also to sanitize that content against "dangerous" html like https://github.com/syntax-tree/hast-util-sanitize does. So, given an input of

_some_ <strong>raw</strong> html

I get

<p><em>some</em> <strong>raw</strong> html</p>

But given an input of

_some_ <a onclick="alert('hello')">strong</a> html

I get

<p><em>some</em> <a>raw</a> html</p>

I had assumed that was the point of this library, but most of the functionality of hast-util-sanitize only works on actual hast nodes like script and a; when it encounters the raw nodes generated by mdast-util-to-hast it simply removes them rather than sanitizing them.

Am I misunderstanding something, or is it not possible to achieve what I want with this tool?

ChristianMurphy commented 5 years ago

@Hamms the sanitization can be configured with https://github.com/remarkjs/remark-react#optionssanitize

Hamms commented 5 years ago

Yes, I'm aware of that. But the problem remains that hast-util-sanitize does not appear to be capable of sanitizing raw nodes at all except by eliminating them, meaning that it's not actually capable of sanitizing the input from mdast-util-to-hast.

Hamms commented 5 years ago

Ahhh, it seems like what I actually want to do is to incorporate https://github.com/syntax-tree/hast-util-raw into my process

Hamms commented 5 years ago

I've confirmed that adding hast-util-raw to the remark-react process makes this work:

diff --git a/index.js b/index.js
index c85e599..6ee8ff8 100644
--- a/index.js
+++ b/index.js
@@ -7,6 +7,8 @@ var sanitize = require('hast-util-sanitize')
 var toH = require('hast-to-hyperscript')
 var tableCellStyle = require('@mapbox/hast-util-table-cell-style')

+var raw = require('hast-util-raw')
+
 var globalReact
 var globalCreateElement
 var globalFragment
@@ -46,6 +48,10 @@ function react(options) {
     var tree = toHAST(node, toHastOptions)
     var root

+    if (toHastOptions.allowDangerousHTML) {
+      tree = raw(tree)
+    }
+
     if (clean) {
       tree = sanitize(tree, scheme)
     }
diff --git a/package.json b/package.json
index b6d5f8c..36fbe02 100644
--- a/package.json
+++ b/package.json
@@ -28,6 +28,7 @@
   "dependencies": {
     "@mapbox/hast-util-table-cell-style": "^0.1.3",
     "hast-to-hyperscript": "^6.0.0",
+    "hast-util-raw": "^5.0.0",
     "hast-util-sanitize": "^1.0.0",
     "mdast-util-to-hast": "^4.0.0"
   },
diff --git a/test/index.js b/test/index.js
index d6cb228..0e83a8c 100644
--- a/test/index.js
+++ b/test/index.js
@@ -121,6 +121,33 @@ versions.forEach(function(reactVersion) {
       'passes toHast options to inner toHAST() function'
     )

+    t.equal(
+      React.renderToStaticMarkup(
+        remark()
+          .use(reactRenderer, {
+            createElement: React.createElement,
+            toHast: {allowDangerousHTML: true}
+          })
+          .processSync('<strong>raw</strong> html').contents
+      ),
+      '<p><strong>raw</strong> html</p>',
+      'renders raw html when specified'
+    )
+
+    t.equal(
+      React.renderToStaticMarkup(
+        remark()
+          .use(reactRenderer, {
+            createElement: React.createElement,
+            toHast: {allowDangerousHTML: true}
+          })
+          .processSync('<a onclick="alert(&#x22;charlie&#x22;)">delta</a>')
+          .contents
+      ),
+      '<p><a>delta</a></p>',
+      'raw html is sanitized'
+    )
+
     fixtures.forEach(function(name) {
       var base = path.join(root, name)
       var input = fs.readFileSync(path.join(base, 'input.md'))

Any objections to me opening a PR with the above change?

wooorm commented 5 years ago

Yes, I do object!

Because including hast-util-raw (or rehype-raw) includes a full blown HTML parser. And that’s really heavy on the browser.

I would suggest people that want that to go remark -> remark-rehype -> rehype-raw -> rehype-sanitize -> rehype-react instead. We could add a note here in the readme, similar to the note in the intro here, though?

Hamms commented 5 years ago

I guess I'm still confused about the intent of this library, then.

The readme implies that the purpose is to be able to render markdown safely, with the features of hast-util-sanitize. But as it's built, the only nodes that end up in the tree that gets sent to hast-util-sanitize are either those specifically by remark - which are already safe - or raw nodes which may contain something unsafe but which are entirely removed by hast-util-sanitize (and also by hast-to-hyperscript), making the existence of sanitization in this project both redundant and misleading.

Is the point of this to provide a MDAST renderer that doesn't allow raw html at all, or is it to provide one that only renders sanitized HTML?

wooorm commented 5 years ago

I guess I'm still confused about the intent of this library, then.

Definitely something we should fix!

The readme implies that the purpose is to be able to render markdown safely

True! But also that it doesn’t use .dangerouslySetInnerHTML, and including raw nodes kinda defeats that purpose.

But as it's built, the only nodes that end up in the tree [...]

And also anything from hName, hProperties, hChildren, which could be anything, so I disagree with “making the existence of sanitization in this project both redundant and misleading”. (Although we should fix the “misleading” part)

Is the point of this to provide a MDAST renderer that doesn't allow raw html at all, or is it to provide one that only renders sanitized HTML?

The point here is to allow a simple markdown to react renderer that is safe. That includes not rendering unsafe HTML. Not sanitising HTML at all would be super unsafe. Not being safe by default would be really bad for XSS and the like. Raw HTML is an escape hatch for markdown that is inherently unsafe. Except if you really know what you’re doing. In which case you need to include an HTML parser. And then the route is remark -> rehype -> rehype-raw -> rehype-react, which we should definitely document better!

Hamms commented 5 years ago

Definitely something we should fix!

I appreciate it! :)

Can you give me an example of markdown input without raw html that would result in unsafe HTML output? I think that would help me understand the concerns this library is intended to protect against.

wooorm commented 5 years ago

The XSS problems stem from plugin use, for example, this tree: https://github.com/syntax-tree/hast-util-sanitize#usage

wooorm commented 5 years ago

(Or, if the user could write that HTML themselves in Markdown, which would be possible with allowDangerousHTML)

maestrow commented 4 years ago

For those, who is looking for a way to use html tags in markdown with remark-parse, I leave this recipe here. Thanks to @ChristianMurphy for his suggestion. I've made just couple improvements:

  1. Module rehype-dom-parse leads to error: document is not defined. So I replace it with rehype-parse.
  2. Extract rehypeParser from handler, so it's created only once.
  3. Also notice about sanitize: false

You can try this snippet in console:

var unified = require('unified')
var remark = require('remark-parse')
var remark2react = require('remark-react');
var ReactDOMServer = require('react-dom/server');
var rehype = require('rehype-parse')

const sample = `
markdown is here
<div style="color:gray;">
text <a href="#">link</a>
</div>
`

const rehypeParser = unified().use(rehype, { fragment: true });

const parser = unified()
  .use(remark)
  .use(remark2react, {
    toHast: {
      handlers: {
        html: (h, node) => 
          // process raw HTML text into HAST so react remark can process it
          rehypeParser.parse(node.value).children
      }
    },
    sanitize: false
  });

const result = parser.processSync(sample)
const html = ReactDOMServer.renderToStaticMarkup(result.contents)
console.log(html)

output:

<p>markdown is here</p>
<div style="color:gray">
text <a href="#">link</a>
</div>