mganss / HtmlSanitizer

Cleans HTML to avoid XSS attacks
MIT License
1.52k stars 198 forks source link

Sanitizing an anchor tag causes unexpected href url changes - query param seperator & become & #401

Closed rclabo closed 1 year ago

rclabo commented 1 year ago

Consider the following case:

var html = "<a href=\"http://www.somesite.com?a=1&b=2\">some text</a>";
var sanitizer = new HtmlSanitizer();
var sanitizedHtml = sanitizer.Sanitize(html);

after running, sanitizedHtml will be: <a href="http://www.somesite.com?a=1&amp;b=2">some text</a>

Notice that & was changed to &amp; While it's of course true that the html encoding of & is &amp; I feel like in this case it should not get encoded because the & occurs in the href URL as a query parameter separator. I'm concerned that the url may no longer function properly when the rendered link is clicked and the site visited.

I did read issue https://github.com/mganss/HtmlSanitizer/issues/116 but the request there is a bit different. Unlike that issue I'm not suggesting that all & chars be un-encoded, just the ones in the href of an anchor tag that are query param separators.

Thoughts?

mganss commented 1 year ago

Encoding as &amp; is the correct way, see https://stackoverflow.com/a/19442133/1970064