Closed Hamms closed 5 years ago
@Hamms what I've been using is the toHast
option
https://github.com/remarkjs/remark-react#optionstohast
With:
// allow inline html to be rendered as React VDOM
import rehype from 'rehype-dom-parse';
// ....
.use(remarkReact, {
toHast: {
handlers: {
html: (h, node) =>
// process raw HTML text into HAST so react remark can process it
unified()
.use(rehype, { fragment: true })
.parse(node.value).children
}
}
}
)
Hmm, I believe you could also do what @ChristianMurphy suggests, but with a dangerouslySetInnerHTML: {__html: '...'}
prop?
Maybe we need a top-level allowDangerousHTML
option to do this by default?
It's not clear to me why the allowDangerousHTML
option I'm currently passing via toHast
isn't working. Is it supposed to?
Looking closer, this doesn't appear to be an issue with toHast
at all, but rather with both the sanitization step and the hast-to-hyperscript
step, both of which strip out the raw nodes generated by mdast-util-to-hast
.
Taking a step back, it's not clear to me what the intent of this library is. For my purposes, I'd like to be able to render markdown with raw html inside, but also to sanitize that content against "dangerous" html like https://github.com/syntax-tree/hast-util-sanitize does. So, given an input of
_some_ <strong>raw</strong> html
I get
<p><em>some</em> <strong>raw</strong> html</p>
But given an input of
_some_ <a onclick="alert('hello')">strong</a> html
I get
<p><em>some</em> <a>raw</a> html</p>
I had assumed that was the point of this library, but most of the functionality of hast-util-sanitize only works on actual hast
nodes like script
and a
; when it encounters the raw
nodes generated by mdast-util-to-hast
it simply removes them rather than sanitizing them.
Am I misunderstanding something, or is it not possible to achieve what I want with this tool?
@Hamms the sanitization can be configured with https://github.com/remarkjs/remark-react#optionssanitize
Yes, I'm aware of that. But the problem remains that hast-util-sanitize
does not appear to be capable of sanitizing raw
nodes at all except by eliminating them, meaning that it's not actually capable of sanitizing the input from mdast-util-to-hast
.
Ahhh, it seems like what I actually want to do is to incorporate https://github.com/syntax-tree/hast-util-raw into my process
I've confirmed that adding hast-util-raw to the remark-react process makes this work:
diff --git a/index.js b/index.js
index c85e599..6ee8ff8 100644
--- a/index.js
+++ b/index.js
@@ -7,6 +7,8 @@ var sanitize = require('hast-util-sanitize')
var toH = require('hast-to-hyperscript')
var tableCellStyle = require('@mapbox/hast-util-table-cell-style')
+var raw = require('hast-util-raw')
+
var globalReact
var globalCreateElement
var globalFragment
@@ -46,6 +48,10 @@ function react(options) {
var tree = toHAST(node, toHastOptions)
var root
+ if (toHastOptions.allowDangerousHTML) {
+ tree = raw(tree)
+ }
+
if (clean) {
tree = sanitize(tree, scheme)
}
diff --git a/package.json b/package.json
index b6d5f8c..36fbe02 100644
--- a/package.json
+++ b/package.json
@@ -28,6 +28,7 @@
"dependencies": {
"@mapbox/hast-util-table-cell-style": "^0.1.3",
"hast-to-hyperscript": "^6.0.0",
+ "hast-util-raw": "^5.0.0",
"hast-util-sanitize": "^1.0.0",
"mdast-util-to-hast": "^4.0.0"
},
diff --git a/test/index.js b/test/index.js
index d6cb228..0e83a8c 100644
--- a/test/index.js
+++ b/test/index.js
@@ -121,6 +121,33 @@ versions.forEach(function(reactVersion) {
'passes toHast options to inner toHAST() function'
)
+ t.equal(
+ React.renderToStaticMarkup(
+ remark()
+ .use(reactRenderer, {
+ createElement: React.createElement,
+ toHast: {allowDangerousHTML: true}
+ })
+ .processSync('<strong>raw</strong> html').contents
+ ),
+ '<p><strong>raw</strong> html</p>',
+ 'renders raw html when specified'
+ )
+
+ t.equal(
+ React.renderToStaticMarkup(
+ remark()
+ .use(reactRenderer, {
+ createElement: React.createElement,
+ toHast: {allowDangerousHTML: true}
+ })
+ .processSync('<a onclick="alert("charlie")">delta</a>')
+ .contents
+ ),
+ '<p><a>delta</a></p>',
+ 'raw html is sanitized'
+ )
+
fixtures.forEach(function(name) {
var base = path.join(root, name)
var input = fs.readFileSync(path.join(base, 'input.md'))
Any objections to me opening a PR with the above change?
Yes, I do object!
Because including hast-util-raw
(or rehype-raw
) includes a full blown HTML parser. And that’s really heavy on the browser.
I would suggest people that want that to go remark -> remark-rehype -> rehype-raw -> rehype-sanitize -> rehype-react instead. We could add a note here in the readme, similar to the note in the intro here, though?
I guess I'm still confused about the intent of this library, then.
The readme implies that the purpose is to be able to render markdown safely, with the features of hast-util-sanitize. But as it's built, the only nodes that end up in the tree that gets sent to hast-util-sanitize are either those specifically by remark - which are already safe - or raw
nodes which may contain something unsafe but which are entirely removed by hast-util-sanitize (and also by hast-to-hyperscript), making the existence of sanitization in this project both redundant and misleading.
Is the point of this to provide a MDAST renderer that doesn't allow raw html at all, or is it to provide one that only renders sanitized HTML?
I guess I'm still confused about the intent of this library, then.
Definitely something we should fix!
The readme implies that the purpose is to be able to render markdown safely
True! But also that it doesn’t use .dangerouslySetInnerHTML
, and including raw nodes kinda defeats that purpose.
But as it's built, the only nodes that end up in the tree [...]
And also anything from hName
, hProperties
, hChildren
, which could be anything, so I disagree with “making the existence of sanitization in this project both redundant and misleading”. (Although we should fix the “misleading” part)
Is the point of this to provide a MDAST renderer that doesn't allow raw html at all, or is it to provide one that only renders sanitized HTML?
The point here is to allow a simple markdown to react renderer that is safe. That includes not rendering unsafe HTML. Not sanitising HTML at all would be super unsafe. Not being safe by default would be really bad for XSS and the like. Raw HTML is an escape hatch for markdown that is inherently unsafe. Except if you really know what you’re doing. In which case you need to include an HTML parser. And then the route is remark -> rehype -> rehype-raw -> rehype-react, which we should definitely document better!
Definitely something we should fix!
I appreciate it! :)
Can you give me an example of markdown input without raw html that would result in unsafe HTML output? I think that would help me understand the concerns this library is intended to protect against.
The XSS problems stem from plugin use, for example, this tree: https://github.com/syntax-tree/hast-util-sanitize#usage
(Or, if the user could write that HTML themselves in Markdown, which would be possible with allowDangerousHTML
)
For those, who is looking for a way to use html tags in markdown with remark-parse
, I leave this recipe here. Thanks to @ChristianMurphy for his suggestion. I've made just couple improvements:
rehype-dom-parse
leads to error: document is not defined
. So I replace it with rehype-parse
.sanitize: false
You can try this snippet in console:
var unified = require('unified')
var remark = require('remark-parse')
var remark2react = require('remark-react');
var ReactDOMServer = require('react-dom/server');
var rehype = require('rehype-parse')
const sample = `
markdown is here
<div style="color:gray;">
text <a href="#">link</a>
</div>
`
const rehypeParser = unified().use(rehype, { fragment: true });
const parser = unified()
.use(remark)
.use(remark2react, {
toHast: {
handlers: {
html: (h, node) =>
// process raw HTML text into HAST so react remark can process it
rehypeParser.parse(node.value).children
}
},
sanitize: false
});
const result = parser.processSync(sample)
const html = ReactDOMServer.renderToStaticMarkup(result.contents)
console.log(html)
output:
<p>markdown is here</p>
<div style="color:gray">
text <a href="#">link</a>
</div>
It looks like it's not possible to get remark-react to render raw html, even when passing the allowDangerousHTML flag through to mdast-util-to-hast.
For example, attempting to add this unit test fails:
with the error:
Is there a different expected configuration to be able to support raw HTML inputs?