clarification regarding inline HTML

jdmarshall commented 4 months ago

I was checking the availability of markdown engines again before picking up with an old project that stalled.

It looks like from the comrak documentation that the parser you use has the ability to escape inline HTML. Is that preserved here?

If so that would be amazing. I have a problem domain where you can't necessarily trust all collaborators, so injection attacks are one of the biggest problems I'm going to have to solve. Being unable to disable arbitrary HTML insertion has been a blocking issue with the other libraries I could find. I really do not want to write my own markdown implementation.

leandrocp commented 3 months ago

@jdmarshall there are 2 options you can use to control unsafe content: unsafe_ and sanitize.

To disallow any unsafe content completely:

MDEx.to_html(~S|
# XSS Test
<a href="javascript:alert(1)">XSS</a>
|,
  render: [unsafe_: false],
  features: [sanitize: true]
)
#=> "<h1>XSS Test</h1>\n<p>XSS</p>\n"

unsafe_ is provided by comrak itself https://docs.rs/comrak/0.22.0/comrak/struct.RenderOptions.html#structfield.unsafe_ and sanitize is done by https://crates.io/crates/ammonia

If you trust the input to some level but want to be on the safe side then you allow unsafe_ content but still sanitize it:

MDEx.to_html(~S|
# XSS Test
<a href="javascript:alert(1)">XSS</a>
|,
  render: [unsafe_: true],
  features: [sanitize: true]
)
"<h1>XSS Test</h1>\n<p><a rel=\"noopener noreferrer\">XSS</a></p>\n"

That seems to be enough for your use case so I'm closing this issue but feel free to reopen it. Thanks!

jdmarshall commented 2 months ago

Also the way I want to use it, I think I also want escape: true

MDEx.to_html(page.content, render: [unsafe_: false, escape: true] )

leandrocp commented 2 months ago

Also the way I want to use it, I think I also want escape: true
MDEx.to_html(page.content, render: [unsafe_: false, escape: true] )

Yes that's to escape it instead of just removing it. I forgot to mention that option.

leandrocp / mdex

clarification regarding inline HTML #29