rails / rails

Ruby on Rails
https://rubyonrails.org
MIT License
55.91k stars 21.63k forks source link

to_json can emit invalid Javascript if called on a string containing unicode newline characters #10320

Closed nfm closed 11 years ago

nfm commented 11 years ago

If a string contains either the unicode character \u2028 (line separator) or \u2029 (paragraph separator), calling as_json or to_json on the string will output JSON that cannot be parsed as Javascript by the brower.

Firefox will raise SyntaxError: unterminated string literal. Chrome will raise SyntaxError: Unexpected token ILLEGAL. I haven't tested Safari/IE/Opera.

Apparently, these characters are valid in string literals in JSON and will be accepted by JSON.parse, but are treated as line endings by Javascript parsers.

Here's a quick demo:

# some_view.erb
<% foo = "This will cause a JS error\u2028" %>
<%= javascript_tag do %>
  var foo = <%= foo.to_json.html_safe %>;
<% end %>

The following HTML will be output:

<script type="text/javascript">
  //<![CDATA[
    var foo = "This will cause a JS error
";
  //]]>
</script>

Firefox and Chrome will raise syntax errors.

Here's one workaround I've discovered:

# some_view.erb
<% foo = "This will cause a JS error\u2028" %>
<%= javascript_tag do %>
  var foo = $.parseJSON("<%= escape_javascript(foo.to_json).html_safe %>");
<% end %>

escape_javascript will encode \u2028 as '&#x2028;'. It doesn't handle \u2029.

Another workaround is to call gsub or tr after to_json.

I'm not sure what the best approach is to tackling this. Would it be appropriate to submit a patch that modified String#as_json to convert these characters to a \n?

I'm not confident on what the implications of this kind of change would be. There may well be other unicode characters that cause the same issue, too.

Related issues:

vbrendel commented 11 years ago

My workaround for now is to add

foo.to_json.gsub(/\u2028/, '\u2028').gsub(/\u2029/, '\u2029')
steveklabnik commented 11 years ago

This sounds like a bug in browsers, not in Rails...

nfm commented 11 years ago

According to this Stack Overflow answer, it's more like a JSON spec bug - JSON allows characters in string literals that are invalid in Javascript code.

The unicode characters in question are equivalent to something like this, which causes the same syntax error:

var foo = "this string
is multiline";

Obviously you need to use an escape sequence (\n, \u000A, etc) for the above to work.

I guess \u2028 and \u2029 are equivalent to the literal line break, which is why they're not valid in Javascript.

In any case, if you're taking user input, storing it, and then returning it as JSON, Javascript execution will break when it hits either of these characters, so it would be preferable not to output them!

I can roll my own fix for this, but this feels like it belongs in Rails, in the same way as other cases that would result in invalid output (eg " gets escaped) are handled in ActiveSupport::JSON::Encoding.

Do you think \u2028 and \u2029 fall under the same category?

@vbrendel tr should be faster than chaining gsub :)

solnic commented 11 years ago

@steveklabnik this is a regression. it works in rails 3.2.x. I upgraded one of my apps to rails 4 rc1 today and was bitten by this pretty severely.

nfm commented 11 years ago

@solnic I was affected by this issue on 3.2.13.

Thanks for the fix guys!

rafaelfranca commented 11 years ago

@nfm could you check 3-2-stable?

nfm commented 11 years ago

@rafaelfranca Yes, I can confirm that 3.2.13 is affected by this issue, and that it's fixed on 3-2-stable (see https://github.com/nfm/rails-json-u2028-test for a minimal test app).

I notice that e4ec944 was on master - was there another commit on 3-2-stable to fix this that I've missed?

rafaelfranca commented 11 years ago

@nfm thank you for confirm and for the test app.

Yes. There is https://github.com/rails/rails/commit/c910388587220e962682b0b9187e79b8f1641c17 that I made only to 3-2-stable