zupper77 / google-code-prettify

Automatically exported from code.google.com/p/google-code-prettify
Apache License 2.0
0 stars 0 forks source link

Prettify doesn't work properly with Unicode characters in identifiers #269

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Highlight some code containing non-ASCII characters in identifiers, for 
example, this C# code:

namespace Matematisk_indlæring
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

(from http://stackoverflow.com/questions/15265459/trouble-with-if-circulating)

(Please include HTML, not just your source code)

What is the expected output?  What do you see instead?
Observe that the "class" highlighting stops before the æ since it is not 
considered an identifier character. æ is a legal C# identifier character, so 
it should be highlighted as such.

What version are you using?  On what browser?
StackOverflow's prettify; Firefox 19.

Please provide any additional information below.

Original issue reported on code.google.com by nneon...@gmail.com on 7 Mar 2013 at 7:43

GoogleCodeExporter commented 8 years ago
Yeah.  The problem is that JavaScript regular expressions don't have unicode 
character classes so to handle all non-Latin letters/digits and even the Latin 
ligatures, I have to list every contiguous code-point ranges.

I've been loathe to do that since it means shipping a lot of code that is 
rarely used, but I haven't actually tried to quantify the amount of code that 
would be required.  I'll look into it.

Original comment by mikesamuel@gmail.com on 7 Mar 2013 at 1:58

GoogleCodeExporter commented 8 years ago
[deleted comment]
GoogleCodeExporter commented 8 years ago
There is this http://xregexp.com/ that extends the javascript regexes with 
unicode character classes. You have to use the Unicode Base 1.0.0 to have the 
Letter category (under addons)

Original comment by xanato...@gmail.com on 7 Mar 2013 at 2:46

GoogleCodeExporter commented 8 years ago
Are there any plans for this issue? This is an important part of translating 
code for educational purposes. I'm  tempted to replace the regex with a much 
more lenient one in my local version

Original comment by john.gralyan@gmail.com on 5 Apr 2015 at 9:20