tulipcc / ParserGeneratorCC

A simple parser generator written in Java (fork of JavaCC 7.0.3) and used in ph-javacc-maven-plugin
Other
10 stars 7 forks source link

String in character list may contain only one character on valid unicode escape sequence #34

Closed apixandru closed 2 years ago

apixandru commented 2 years ago

The below sequence is valid and should be properly parsed

char c = '\u005c\u005c';

When using the JavaParser library, I noticed that unicode escape sequences weren't properly escaped.

Their original grammar is here https://github.com/javaparser/javaparser/blob/master/javaparser-core/src/main/javacc/java.jj#L630

I tried modifying the character literals to what's below but then i got an error about having too many characters in a char.

  < CHARACTER_LITERAL:
      "'"
      // TODO: Could (and the duplicate code in STRING_LITERAL) this be extracted out?
      (
          (~["'","\\","\n","\r"])
       |
          // unicode escape sequences
          (~["\\u005c\\u005c", "\u005cn", "\\u005r",])
       |
apixandru commented 2 years ago

It turns out that this is not actually necessary if the escape sequence is


  < CHARACTER_LITERAL:
      "'"
      // TODO: Could (and the duplicate code in STRING_LITERAL) this be extracted out?
      (
          (~["'","\\","\n","\r"])
       |
          // starts off with unicode backslash
          ("\\" "u" "0" "0" "5" ["c", "C"]
              // regular escape sequences
              (["n","t","b","r","f","\\","'","\""]
                  // escapes another unicode backslash
                  | ("\\" "u" "0" "0" "5" ["c", "C"])))
       |