mloayzagahona / plovr

Automatically exported from code.google.com/p/plovr
0 stars 0 forks source link

UTF-8 encoding failure #53

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Set output-charset to UTF-8 in the configuration file
2. Build - both serve and build modes, ensure you use soy files
3. Run the application

The generated script file fails to load with the error 'Unexpected token, ' or 
similar.

The problem is caused by not escaping 'new line' unicode characters. Some 
unicode characters, at least \u2028 & \u2029, used by soyutils_usegoog.js, must 
be escaped even if UTF-8 encoded.

By switching to US_ASCII encoding the problem is resolved.
Tested both on Mac and Windows, all major browsers.

The issue was introduced a couple of releases ago, the problem is not specific 
to the latest release.

Original issue reported on code.google.com by stefand...@gmail.com on 27 Nov 2011 at 4:44

GoogleCodeExporter commented 9 years ago
When you serve the JS (or is this when plovr serves the JS), are you setting 
the charset? The easiest way to do this is via the script tag:

<script type="text/javascript" src="myscripts.js" charset="UTF-8"></script>

Original comment by bolinf...@gmail.com on 27 Nov 2011 at 10:12

GoogleCodeExporter commented 9 years ago
What is specific to my setup, I should have mentioned that, is that I serve
the scripts from a CDN.
The script tags are generated by the closure library,
goog.module.ModuleLoader.createScriptElement_ does not set the charset
attribute.
Will override that method and set the attribute appropriately, will let you
know the result. Thanks for the hint.

Original comment by stefand...@gmail.com on 28 Nov 2011 at 7:10

GoogleCodeExporter commented 9 years ago
Note this is also documented in http://plovr.com/options.html#output-charset

Original comment by bolinf...@gmail.com on 28 Nov 2011 at 3:38

GoogleCodeExporter commented 9 years ago

Original comment by bolinf...@gmail.com on 28 Nov 2011 at 3:38

GoogleCodeExporter commented 9 years ago
I tested the recommended change, set the 'charset' attribute for the script
tag to 'UTF-8'.
It fails the same way in all major browsers. The attribute is set as
expected, the entire page is also UTF-8 encoded.

I do not consider the problem a stopper, encoding using US-ASCII is
adequate in most cases.
But I would change the documentation for UTF-8 encoding, unless I missed
something it does not work.

Original comment by stefand...@gmail.com on 28 Nov 2011 at 6:12

GoogleCodeExporter commented 9 years ago
I believe the problem is not the encoding itself. The character \u2028,
Unicode line separator, must be escaped in javascript.
http://stackoverflow.com/questions/2965293/javascript-parse-error-on-u2028-unico
de-character
Not sure about \u2029, both \u2028 & \u2029 are used by the closure soy
files.

Original comment by stefand...@gmail.com on 28 Nov 2011 at 6:30

GoogleCodeExporter commented 9 years ago
There is a bug in closure-templates related to this right now:

http://code.google.com/p/closure-templates/issues/detail?id=52

Original comment by mark...@gmail.com on 27 Feb 2012 at 1:15