context aware escaping - Githubissues

ConnyOnny commented 8 years ago

You already mentioned that you need different escaping for different languages to template. However with the web it's way worse, because you have different languages in one file. Escaping is different for html code in an html file, than for css code in that same html file, javascript code in that same html file, a string constant in javascript code in that html file. Let the programmer (a typical web designer) choose their poison -- ehrm escaping function and they will make mistakes. This leads to exploits which are already out there in millions. Rust is supposed to be secure, right?

From a templating engine written in Rust in 2016 one expects no less than a safety guarantee, that all strings are well-escaped regarding their context. This is not as easy as it may sound. I see two ways to do this.

While writing the template, have a parser running in the background, which keeps track, which context the writer is currently in. I believe, the Servo people should have something like that lying around. This would be a dynamically enforced guarantee.
The AST of the template without the insertions is parsed at compile-time, and so for each insertion, the context is already known statically. The problem with that is, that you can't modify the AST anymore without telling ructe (e.g. no pasting of a pre-html-formatted article text), because it could invalidate ructe's assumption about the context we're supposed to be in. Or it could be allowed, if you write unsafe around it. Or the html strings could be parsed dynamically and checked for closedness in the given static context (i.e. the original context is restored after the string). Closedness results for commonly used strings (frequently-accessed article html code from a database) could be cached to avoid running the parser on every execution.

Sadly, of course the client's browser could use a non-standard parser and build up a completely different AST, in which case (s)he will not be protected. But in 2016, with all the web standards we have, this should be a very rare case.

What are your thoughts?

kaj commented 8 years ago

Yes, I fully agree about the goal. Unfortunately, I also agree that it is not as easy as it sounds. :-)

I have considered way number 2, and I think it is plausible. A beginning would probably be to convert the ToHtml trait to a more generic ToEncoded<T> trait, where T could be Html, Css, Javascript, etc. Then perhaps we could implement a way to specify target encoding locally in the template (or at least in rust functions returning preencoded html) as a first step before having the template compiler keeping track of what escaping should be used where.

kornelski commented 6 years ago

The solution should support nesting of types. <script>var x = @x</script> is text-in-JS-in-CDATA, so a text value there needs JS string encoding, but then also CDATA </ escaping. If @x was JS code, it would still need the layer of CDATA escaping.

Bonus points for distinguishing between HTML body, attributes ('/"), CDATA (</) and PCDATA (the <title> element is special!).

There's also a need to escape elements in URLs, e.g. <a href="url?value=@value"> I'd like the value to be URL-encoded (it technically is URL-in-attribute and needs HTML escaping too, it just happens usually there aren't literal <>" in the URLs).

kaj / ructe

context aware escaping #1