escapeHTML method - Githubissues

jzaefferer commented 11 years ago

Browsers provide methods for encoding and decoding URLs or fragments, but there's nothing built-in for escaping html. Since this is a pretty common thing to do, it would be useful to have a proper and fast native implementation. The "proper" aspect is more interesting, since its easy to screw up a custom implementation.

The API needs to be easily polyfillable, along with a reference polyfill in the style of MDN's array prototyp extension.

See QUnit's implementation as a starting point, compare that with a few others. Report agains W3C or WHATWG DOM groups to get a standards discussion started.

mathiasbynens commented 11 years ago

Relevant: https://github.com/JSFixed/JSFixed/issues/3

dmethvin commented 11 years ago

See also jQuery bug 11773 , I do think it's a very common problem that causes lots of XSS issues. There is some discussion here of the difficulty of doing correct escaping and how it can be context-dependent, meaning that a dev cannot just delegate the responsibility off to some utility function and expect to be safe regardless of its application.

mathiasbynens commented 11 years ago

ES6 will have tagged template literals that can help with this sort of thing (although it won’t provide full XSS protection in all cases, e.g. if you’re using unquoted attributes in HTML and allowing user input in their values, you’re still screwed).

jzaefferer commented 11 years ago

@mathiasbynens tagged template literals would need a transpiler to make them usable in current browsers, right? The syntax certainly didn't suggest any way of polyfilling them.

Thanks for the JSFixed link. Mike West also suggested to bring this up for DOM specs, since it doesn't make that much sense in ECMAScript.

mathiasbynens commented 11 years ago

@mathiasbynens tagged template literals would need a transpiler to make them usable in current browsers, right? The syntax certainly didn't suggest any way of polyfilling them.

Yeah, I think so too.

jzaefferer commented 11 years ago

Benchmark for various regex based escaping methods: http://jsperf.com/htmlencoderegex/15

jzaefferer commented 11 years ago

Some details on tagged template literals: https://speakerdeck.com/kitcambridge/es-6-the-refined-parts (slide 21)

jzaefferer commented 9 years ago

Since this is a security-related issue, let's see if I can get some more input on this. /cc @mikewest and @fhemberger

The JSFixed issue linked above doesn't seem to have gone anywhere. I've commented there to ask if there was any effort to get a method into the DOM specification.

fhemberger commented 9 years ago

I'm no expert on this, but you should probably normalize the unicode characters first, before the actual escapement. (I think Mr. Unciode aka @mathiasbynens can say more about this).

Characters for escapement should be the obvious </>, single and double quotes, backticks (for template literals) and probably square brackets, too (to avoid funny evaluations). Maybe @cure53 has some ideas for improvement as well.

mathiasbynens commented 9 years ago

I don’t think Unicode normalization is needed here.

cure53 commented 9 years ago

I don't think, browsers/DOM should offer such a method as the scope is very narrow and will most likely lead developers to false security assumptions. We (during penetration tests) see many many occurrences of misuse of PHP's htmlentities and htmlspecialchars in the wild. Developers assume they protected themselves properly against XSS but did not - not knowing about browser behavior and quirks. Or the context strikes back.

A functionality such as escapeHTML can only work properly if used in the correct context, correct document mode, is prone to be bypassed using contenteditable, charsets, SVG, MathML, mXSS and many other attack techniques. There was countless discussions about a feature like this in the past and none of the approaches took off.

Furthermore, creating and offering such functionality is an invitation letter for developers to use bad practices. There is usually no reason to pipe user controlled input into innerHTML and friends. There is textContent and similar properties for that. And if there really is (web mail, browser crypto, real-life editing, etc.), then the last thing you want to do is escape - but rather sanitize! Or use sand-boxed Iframes. Or CSP. I think escaping (or more correctly, encoding) is a tool from the nineties used to solve a problem of today and tomorrow.

If however such functionality is indeed required or wished for, the first thing to happen is to think about what it should actually do, so it is crystal-clear for developers where it helps and where it doesn't.

Yaffle commented 9 years ago

@cure53 nothing is perfect, but seems some javascript libs (underscore.js, prototype.js, dojo (dojo/string)) provides a method, which will work fine when you use it for 1) quoted HTML attribute value; 2) between HTML tags; - popular use cases, not?

jzaefferer commented 6 years ago

Not resolved, but also not caring anymore

jzaefferer / standards-tracker

escapeHTML method #2