Closed ExplodingCabbage closed 10 years ago
Good question!
The main goal of he is to encode non-ASCII symbols into HTML entities, and to be able to decode these in all their forms, i.e. he.encode()
and he.decode()
.
The encoding part is probably most useful as part of a build script, or as part of a Node.js application that outputs that data as part of a response. The decoding part is the hardest (and probably the main reason why one would use he
), as there are so many different ways to encode each character, and there are a lot of weird exceptions and edge cases. If you want to decode HTML entities according to the spec, in any environment, then you definitely need he.
On the client side, at run-time, escaping non-ASCII symbols (like he.encode()
does) before setting it as .innerHTML
won’t really make a difference – only escaping the unsafe characters would matter in that case.
If your only goal is to escape HTML like the he.escape()
helper method does, then he is probably overkill.
While the <textarea>
hack works, it feels very hacky to me, and it won’t work in non-browser environments (like you mentioned). Even in browser environments it might give results that are in violation of the spec. Yep, some browsers have buggy implementations of named character references — try http://mathias.html5.org/tests/html/named-character-references/ in IE, for example. Try older browser versions too.
Just .replace()
ing the characters as needed (like he.escape()
or _.escape()
do) seems much simpler, less hacky, ensures the output is predictable/deterministic, and it’s probably faster, too.
Thanks for the reply - I think it resolves my question fully. BTW, I went ahead and posted an answer on SO about your library. Naturally, feel free to tweak it if you reckon I've missed anything important or said anything dumb. :)
:+1:
The punchline to all this, which might interest you: you were right to be turned off by the <textarea>
hack. It turns out that in jQuery 1.8 and below, the code given in http://stackoverflow.com/a/1395954/1709587 is XSS-vulnerable, because .html()
in those versions of jQuery would explicitly and deliberately run scripts in the given HTML string. A commenter gives the example of $("<textarea/>").html('<script>alert("lol")</script>').text()
, which will show an alert on jQuery 1.7.
I am glad to have offered up your library as an alternative answer, but sad to have polished up the insecure <textarea>
answer and edited in reassurances about it being secure. :( Fixing now.
Good update to the question. :+1:
Very nice and thoughtful reply too. Indeed jQuery 1.8 and below runs scripts in HTML strings, and this is deliberate. It's useful in some situations—I remember once making a Tumblr theme with infinite scrolling that needed to execute <script>
tags to enable dynamic content, because of how limited Tumblr's theming interface is. It allows only entire pieces of HTML to be inserted into the page (that is, if you want non-JS compatibility).
Nice discussion
Consider the following Stack Overflow answer to the question How to decode HTML entities using jQuery?
Previously, the answer just included the first code snippet. I recently edited the answer to note the rationale behind using a
textarea
instead of adiv
. However, I'm a little uneasy, because I know that your library exists and is not (as far as I can tell) strictly targeting node users. I find myself wondering why.I'll probably post a link to this library as an answer (unless you'd like to do so yourself) to that question regardless, since I figure that people who are using node may benefit from having a single solution that is usable both clientside and serverside. But how about everyone else? What reason is there for anyone to serve a 300 line script to serve a purpose that can - it seems to my naive eyes - be done in 50 characters with a clever hack?
Are there any situations at all in which the
textarea
hack fails (or at least is not guaranteed by spec to succeed)? I confess to being slightly uneasy about it since I don't know where (or for that matter, if) the spec determines the behaviour of browsers when presented with HTML elements containing disallowed children, likebut from the testing I've done, it seems to work.
Sorry to offload a question like this onto you, but it seems to be right in your area of expertise and is relevant when figuring out to whom this library is useful. (Indeed, if there is something profoundly wrong with the
textarea
hack, it almost seems worth noting that in this library's README - otherwise, the case for using a library for this purpose at all is unclear).