lukehutch / pikaparser

The Pika Parser reference implementation
MIT License
141 stars 12 forks source link

Additional fixes for Java PEG grammar #18

Closed psfblair closed 4 years ago

psfblair commented 4 years ago

The grammar can now parse MemoTable.java successfully.

Also, the previous fix for readAllLines wasn't preserving a linebreak at the end of file; this fix to TestUtils preserves all characters in the file.

There are still a few issues which result in syntax errors using the original Java PEG grammar, and one additional issue with respect to parsing comments:

  1. With regard to comments, the multiline comment pattern I submitted previously doesn't seem to work:

"/*" (_ !"*/")* "*/"

I take this as a / followed by zero or more characters that are not followed by /, followed by */ .

When I'm debugging an attempted match on a file containing only / / I see the following at the space between the asterisks:

=============== POSITION: 2 CHARACTER:[ ] ===============
Failed to match: "/*" : 2
Setting new best match: _ <- [ -~] : 2+1
    Following seed parent clause: _ !"*/"
    Following seed parent clause: EOT <- !_
Matched: _ <- [ -~] : 2
    Following seed parent clause: !"*/"
Failed to match: "*/" : 2
    Following seed parent clause: SUB / ()
Failed to match: SUB <- '\t' : 2
Setting new best match: !"*/" : 2+0
Matched: !"*/" : 2
Failed to match: _ !"*/" : 2
Setting new best match: SUB / () : 2+0
    Following seed parent clause: Compilation <- Spacing (SUB / ()) EOT
Matched: SUB / () : 2
    Following seed parent clause: Compilation <- Spacing (SUB / ()) EOT
Failed to match: EOT <- !_ : 2
Failed to match: Compilation <- Spacing (SUB / ()) EOT : 2
Failed to match: Compilation <- Spacing (SUB / ()) EOT : 2

From what I see, the space fails to match _ !"*/" but somehow it doesn't match the clause _ !"*/" / () which I believe is what the zero-or-more is supposed to turn into.

  1. With unicode escapes in the grammar, I was able to put back:

    SUB <- '\u001a' ;

but not:

```[ \t\r\n\u000C]+```

which is still a syntax error.

  1. The original grammar had the following, which resulted in a syntax error:
InfixExpression
    <- UnaryExpression
          ((InfixOperator UnaryExpression) / (INSTANCEOF ReferenceType))* ;

removing the extra parentheses, however, works:

InfixExpression
    <- UnaryExpression
          (InfixOperator UnaryExpression / INSTANCEOF ReferenceType)* ;
  1. Finally, the issue with escaped hyphens inside character classes remains. For example, this is still a syntax error:
Exponent
    <- [eE] [+\-]? Digits ;

Those are all the issues I have found.

lukehutch commented 4 years ago

Sorry, we had a mid-air collision. Can you please rebase on the master version? In particular, the Java-1.8.peg changes I made are not compatible with yours.

The problem with the comment rules in your original version of the grammar was that _ !"*/* would not match the last character of the comment, since it only matches each character that is not followed by */. Therefore you need to match not "/*" (_ !"*/") "*/", but rather "/**/" / ("/*" (_ !"*/")* _ "*/") or similar. The same problem was also present in the original clause for line comments -- however I simplified that to match "//" [^\n\r]*.

For loadResourceFile, you should use Files.readString(Paths.get(uri)) to read the whole file as a String.

I'll take a look at other issues. It would be helpful if you could please open a separate issue for each problem you find in future. Thanks!

lukehutch commented 4 years ago

Char escaping and unescaping is fixed (problem 2).

I guess you removed the INSTANCEOF rule from the grammar because it didn't work. I fixed it in master by adding INSTANCEOF as an infix operator.

lukehutch commented 4 years ago

OK, the parenthesization of rules is also fixed now (problem 3).

lukehutch commented 4 years ago

The char range escaping should be fixed too (problem 4)

psfblair commented 4 years ago

Wow, I had no intention of hijacking your Saturday! Thanks for all that work!

I'll put in that change for Files.readString -- that's a JDK 11 API I wasn't familiar with and my IDE was still using 10.

I'll also submit a PR with the Java.1.8.peg grammar as close as possible to the original (which is also in the project as Java.1.8.original_withoutcomments.peg -- the comments had to be removed because they were Java-style comments). After your fixes today, the only differences are the definition of and the changes that had to be made for the comments.

Thanks again!

lukehutch commented 4 years ago

You're welcome... You deserve a medal for sticking with this bleeding edge code until the bugs were hammered out! I appreciate your help finding them all.

psfblair commented 4 years ago

You've been very generous and patient with me, thanks so much!

lukehutch commented 4 years ago

No problem at all! I'm very happy you found all those bugs, so that I could fix them. Sorry for all the trouble! I hope it's smooth sailing now.