rsms / markdown-wasm

Very fast Markdown parser and HTML generator implemented in WebAssembly, based on md4c
https://rsms.me/markdown-wasm/
MIT License
1.51k stars 62 forks source link

Preventing JavaScript URLs upon parsing markdown links #14

Closed ghost closed 3 years ago

ghost commented 3 years ago

I was looking at using this project for a server-side Markdown parser, but someone I was working with tried out the demo and pointed out that this markdown:

[some text](javascript:alert("xss"))

renders as an actual JavaScript link. Does markdown-wasm offer more fine-grained control to prevent this specific URL type from being parsed?

If not, we can attempt to remove them manually (using JavaScript), but it would be great if it could directly be dealt with in the Wasm parser itself.

lostfictions commented 3 years ago

Looks like that's considered a valid link under the CommonMark spec, which md4c (and thus markdown-wasm) strictly comply to: https://spec.commonmark.org/0.29/#link-destination

I'd recommend just passing output through DOMPurify.

ghost commented 3 years ago

Yes, it is a valid URL, that I have no problem with, but users might appreciate a flag that prevents emitting them.

In my particular case, javascript: links do nothing on my site, since our server sends Content-Security-Policy HTTP headers that strictly deny inline JavaScript and CSS, so I didn't even need to modify the output, as clicking the link is a no-op.

Also, it was great to see an MD parser that could emit XHTML instead of HTML, DOMPurify would emit HTML that would break when mixed with XHTML. I also used this server-side, so a server-side solution would be ideal, although DOMPurify offers a server-side solution for Node.js, it is a rather large library.

rsms commented 3 years ago

Ah, this would definitely be a nice option to add, to allow this as an opt-in (disallowed by default.) For now, a post-processing step on the generated HTML is a good idea if you allow untrusted users to enter & render HTML. I.e. something like /href="javascript:[^"]*/ and replace it with ".

rsms commented 3 years ago

4b48783c3e209d900a459a2bd76aa314cb62f7e1 (released with v1.2.0 on NPM) includes a fix for this. Starting with this version, by default any href starting with "javascript:" is now stripped. "javascript:" URIs can be explicitly enabled by setting the option allowJSURIs.

Example input:

[XSS test](javAscRipt:alert("xss"))

Output:

<p><a href="">XSS test</a></p>