judofyr / temple

Template compilation framework in Ruby
http://judofyr.net/posts/temple.html
MIT License
491 stars 53 forks source link

WIP: Always escape quotes in html attribute values #109

Closed doits closed 7 years ago

doits commented 7 years ago

By having quotes inside html attribute values, the generated html code can get wrong. For example

[:html, :tag, 'div',
  [:html, :attrs, [
    [:html, :attr, 'data-something', [:static, '"test']]
  ], [:content]
]

compiles to

<div data-something=""test"></div>

Notice the " closes data-something which obviously is not intended. To fix this, all quotes now always get escaped, so generated code becomes:

<div data-something="&quot;test"></div>

" is escaped to &quot; and ' is escaped to &#34, so both delimiters can be used.


The idea to fix this originates from using Slim and the issue https://github.com/slim-template/slim/issues/464.

The fix is based on https://groups.google.com/forum/#!topic/ruby-security-ann/8B2iV2tPRSE, where Ruby on Rails fixed their generators by always escaping " with &quot; regardless whether it is html_safe? or not. I just made sure both delimiters (" and ') are escaped.

It is WIP because I am not sure if the current implementation is the cleanest/fastest. Maybe someone with more experience in hacking Temple can suggest a cleaner/simpler way to do it.

By having this fixed in Temple, all libraries generating HTML automatically get this fix. Otherwise each library would have to make sure the quotes are correctly escaped.

I'm not 100 % sure if there are any side effects of this (eg. HTML code being valid wo this PR gets invalid), but a quick test on my side revealed no errors.

k0kubun commented 7 years ago

This is backward-incompatible. If escaping is already cared, it'll be double escaped. And I don't think this is Temple::HTML::Fast's responsibility because some implementation (like Haml) optionally enables HTML-escaping. Thus it should be separated to another expression or filters.

My solution for the vulnerability (ignoring html_safe or not) in Hamlit is this: https://github.com/k0kubun/hamlit/blob/v2.8.1/lib/hamlit/force_escapable.rb

I want this kind of thing to be ported to Temple.

doits commented 7 years ago

This is backward-incompatible

Yeah, since it changes (before invalid) output. Might be a candidate for a bigger version bump.

If escaping is already cared, it'll be double escaped.

I don't think so, because escaping quotes multiple times yields the same result (&quot; and &#39 stay the same, even if the input was escaped already). It is only escaping the quotes, nothing more.

And I don't think this is Temple::HTML::Fast's responsibility because some implementation (like Haml) optionally enables HTML-escaping

Right, regular complete HTML escaping is not touched at all. It only makes sure the HTML is valid, because IMO if I have [:html, :attr, ...] I'd expect it gives me back syntactical correct HTML (at least that I have an attribute with that value, because that is what this expression says).

So if the implementation already makes sure that it is correctly escaped (eg it already passes down [:html, :attr, :whatever, [:static, "&quot;value"]]), the output will not change.

My solution for the vulnerability (ignoring html_safe or not) in Hamlit is this: https://github.com/k0kubun/hamlit/blob/v2.8.1/lib/hamlit/force_escapable.rb

The difference here is that it does not escape everything by force, but only the quotes and only in tag attribute values. And it only does it because otherwise the HTML-code is broken, so IMO it should not be optional (who optionally wants to have broken HTML output?) ... or at least for the end user it should not be optional. An option for the implementation might be OK, so it can say "Don't touch my quotes, I know what I do" (like Hamlit) - but then it would only be a difference in terms of speeds, at least for what I currently see after hacking this together.


In general this needs some more testing for sure (it is a quick hack this afternoon after digging through Slim and Temple source code ...), and maybe there is some input where it breaks things. I have to test it more on real world HTML too, but I wanted to get this out for review fast. If it breaks something in Hamlit (or somewhere else) I'd be happy to see an example.

k0kubun commented 7 years ago

I don't think so, because escaping quotes multiple times yields the same result

Ah yes, you're absolutely right. I didn't see your code well.

And it only does it because otherwise the HTML-code is broken

Umm? This part didn't make sense. Completely HTML-escaped attribute characters are unescaped by HTML parser in browser and you can get the unescaped value by getAttribute in JavaScript. When you put HTML in attribute value (with HTML-escaped form), you will get and use it from JavaScript and it will work. I don't think escaping HTML in attribute value is a problem. If it's a problem, Haml has had it for a long time...

If it breaks something in Hamlit (or somewhere else) I'd be happy to see an example.

The behavior isn't broken, but the performance will definitely get worse. Calling gsub on rendering time is very slow. Haml and Hamlit don't generate such ineffective code (at least for HTML-escaping attribute values, Ruby 2.3+ is required for Haml). For performance, we should escape anything with C or Java extension, something other than Ruby's gsub. So, if this PR were merged, I'd stop using this filter. "difference in terms of speeds" matters on production system. That's why this functionality must be optional.

Then, why don't you create this filter as another subclass? At least this filter won't be "Fast". Is there any good reason to introduce breaking change and big version bump even if it can be avoided?

doits commented 7 years ago

And it only does it because otherwise the HTML-code is broken

Umm? This part didn't make sense. Completely HTML-escaped attribute characters are unescaped by HTML parser in browser and you can get the unescaped value by getAttribute in JavaScript.

Yes, if the value in [:html, :attr, 'tag', value] is escaped (e.g. by escape_html), then we have no problem. But this means everything I pass to this expr has to be escaped manually, so every implementation must do this on their own (which not every implementation does right, for example Slim). My idea is that whatever I pass as value should work, whether it is escaped already or not, and every implementation gets this "fix" automatically.

If value is not escaped and contains quotes, the HTML-code will be broken (e.g. generating something like <div title="<img src="/path/to/file.png"></img>"></div> instead of <div title="<img src=&quot;/path/to/file.png&quot;></img>"></div>).

By only escaping the quotes (currently with slow gsub quote_html), it makes sure it does not double escape if input is escaped already and nothing breaks (otherwise it simply could use escape_html). Maybe there is some C binding out there for quote_html.

But the main question is: Is this behaviour reasonable? Imagine there was a faster way to replace the quotes with their escaped values - would this be a good feature then? If yes, we can optimize this speed wise, do some benchmarks etc and see where it goes.

Then, why don't you create this filter as another subclass? At least this filter won't be "Fast". Is there any good reason to introduce breaking change and big version bump even if it can be avoided?

Yeah, this is an idea, too. So it would be disabled by default and implementations at least could opt in to use it. This would also allow to bring this into the library and implementations could start to play with it, so maybe it gets some testing.

I will write a filter that does this and see where it goes with it.

k0kubun commented 7 years ago

But the main question is: Is this behaviour reasonable?

I don't know. As I said before, I will never use it because Hamlit resolves the vulnerability by another approach and Haml will follow it for the same reason as Hamlit (consistency of escaping and simplicity) if I implement.

So, you should consider it yourself with your use case https://github.com/slim-template/slim/issues/464. Framework that doesn't have real use case will fail. Thus ideal approach to have this filter in Temple is introducing this to Slim first, then backport it to temple allowing opt-in if it's considered convenient regardless of language. That's the way Hamlit has done https://github.com/judofyr/temple/pull/95 https://github.com/judofyr/temple/pull/96.

Anyway, current diff that doesn't have opt-in won't be merged. And I expect you to have "the ideal approach". So closing this PR. Reopen this PR after the behavior is accepted in at least one widely-used template language that uses temple. A feature without use case won't be applied to any framework.

doits commented 7 years ago

OK, in general I see this more as a "bug fix" for Temple (because [:html, :attr, ...] produces invalid HTML with certain inputs) - that is why I did not implement it opt in in the first case. But I understand you want to keep Temple more simple and faster (speaking not to validate or modify the input of implementations beyond simple transformations), so that each implementation must make sure to pass in correct data. (Maybe a hint in documentation for [:html, :attr, ...] that input must be html encoded is be a good thing then.)

I'll see what I have to change in Slim then to fix this issue then.