phuocng / html-dom

Common tasks of managing HTML DOM with vanilla JavaScript. Give me 1 ⭐if it’s useful.
https://phuoc.ng/collection/html-dom
MIT License
6.52k stars 459 forks source link

Sanitize HTML strings has flaws but implies that this is an easy problem to solve #281

Open simonw opened 11 months ago

simonw commented 11 months ago

As discussed on Hacker News: https://news.ycombinator.com/item?id=38162435

This article: https://phuoc.ng/collection/html-dom/sanitize-html-strings/ has some problems.

First, it uses value.startsWith('javascript:') to remove javascript: attributes, which misses the case <a href=" javascript:alert('hi')"> - the leading space.

But even if you apply .trim() there are still all sorts of ways this could go wrong - JavaScript: and many others too, probably. A better approach would be to allow-list http:// and https:// URLs.

But... the core problem with that article is that it implies to people that they should feel confident writing their own HTML sanitization mechanisms.

I'm a big supporter of VanillaJS for almost everything else... but HTML sanitization is a wildly difficult problem. Saying things like "It's that easy!" gives a very false sense of how hard it is.

Personally I think that particular article should be removed entirely. If not removed, then it should start (not end) with a very clear warning not to attempt to roll your own solution for this, and promote DOMPurify instead.

The current conclusion reads:

Ensuring the security of your web application is crucial, and sanitizing HTML strings is an essential step in achieving this. The DOMParser API is one tool you can use to get the job done right. By sanitizing your HTML strings, you can prevent malicious code from being executed and protect your users from harm.

It's worth noting that while removing script tags and inline event handlers can help prevent malicious code execution, it's not a bulletproof solution. There are still ways attackers can inject harmful code into your app, such as through CSS styles or exploiting vulnerabilities in your server-side code. That's why it's important to use multiple layers of defense when securing your web application.

This conclusion implies that the DOMParser solution provided in this article "can prevent malicious code from being executed" - that's a strong promise, and one that I don't think you can deliver on in an article like this one.

The second paragraph warns that "it's not a bulletproof solution" and then mixes the issues of CSS styles and vulnerabilities in server-side code - very different problems. This is not a strong enough warning!

(I love everything else about this project aside from this one article.)

phuocng commented 10 months ago

Thanks for the feedback, @simonw 👍 The article's content is available in this repository, so I would greatly appreciate it if you could submit a pull request to update the wording. Thank you!