Open ConnyOnny opened 8 years ago
Yes, I fully agree about the goal. Unfortunately, I also agree that it is not as easy as it sounds. :-)
I have considered way number 2, and I think it is plausible. A beginning would probably be to convert the ToHtml
trait to a more generic ToEncoded<T>
trait, where T could be Html, Css, Javascript, etc. Then perhaps we could implement a way to specify target encoding locally in the template (or at least in rust functions returning preencoded html) as a first step before having the template compiler keeping track of what escaping should be used where.
The solution should support nesting of types. <script>var x = @x</script>
is text-in-JS-in-CDATA, so a text value there needs JS string encoding, but then also CDATA </
escaping. If @x
was JS code, it would still need the layer of CDATA escaping.
Bonus points for distinguishing between HTML body, attributes ('
/"
), CDATA (</
) and PCDATA (the <title>
element is special!).
There's also a need to escape elements in URLs, e.g. <a href="url?value=@value">
I'd like the value to be URL-encoded (it technically is URL-in-attribute and needs HTML escaping too, it just happens usually there aren't literal <>"
in the URLs).
You already mentioned that you need different escaping for different languages to template. However with the web it's way worse, because you have different languages in one file. Escaping is different for html code in an html file, than for css code in that same html file, javascript code in that same html file, a string constant in javascript code in that html file. Let the programmer (a typical web designer) choose their poison -- ehrm escaping function and they will make mistakes. This leads to exploits which are already out there in millions. Rust is supposed to be secure, right?
From a templating engine written in Rust in 2016 one expects no less than a safety guarantee, that all strings are well-escaped regarding their context. This is not as easy as it may sound. I see two ways to do this.
unsafe
around it. Or the html strings could be parsed dynamically and checked for closedness in the given static context (i.e. the original context is restored after the string). Closedness results for commonly used strings (frequently-accessed article html code from a database) could be cached to avoid running the parser on every execution.Sadly, of course the client's browser could use a non-standard parser and build up a completely different AST, in which case (s)he will not be protected. But in 2016, with all the web standards we have, this should be a very rare case.
What are your thoughts?