Closed psfblair closed 4 years ago
I'll try to look into this. Something is definitely still broken. Weirdly, I also tried the Parboiled2 PEG grammar for Java 1.6, which I used successfully for all the benchmarks, and it is no longer working either with the git master version of the pika parser. I'm not sure what is going on, but I'm a bit swamped right now, so I'll need a few days to look at this. If you wanted to dig into it, I suggest taking a few small Java files and trying to parse them, slowly adding in additional features in the source until something breaks.
This is an update of the tests using a modification of Roman Redziejowski's Java 1.8 PEG grammar for Mouse; the comments had to be removed, since the pika parser can't handle them. (The copyright info is in the adjoining file.)
The pika parser was not quite able to handle that grammar, so besides the file with comments removed (in Java.1.8.original_without_comments.peg), there is also a modified Java.1.8.peg which loads the grammar successfuly, though it nevertheless is unable to result in a completely sucessful parse of the accompanying MemoTable.java file.
Here is what pika parser had trouble with:
1. Missing PEG underscore clause
This gives:
The grammar seems to be using the underscore as a PEG wildcard character (it also appears in one other place in the file). I was able to define an underscore rule to fix the issue, though now it only handles printable ASCII:
-- 2. Difficulty defining Java comments in the grammar.
I don't even really understand what this rule is supposed to mean; this is the other place the underscore was used. However, even defining the underscore rule didn't solve the problem. I ultimately tried the following, which allowed the grammar to be loaded but which still didn't result in a successful parse of Java comments:
3. Syntax error for escaped hyphens in character ranges in the grammar Putting escaped hyphens inside a character range doesn't seem to work, even though there is a rule for them:
Removing the escaped hyphen fixes the issue, though this prohibits subtraction and negative exponents.
4. Some problem with zero or more of an alternative in the grammar. I'm not sure what caused the problem with this one:
Removing either of the alternatives and one set of parentheses fixes the issue. The modified Java grammar file gets rid of the INSTANCEOF.
5. Unable to handle unicode in the grammar. After those fixes, I encountered the following in a second pass:
Fixing this required removing both of the unicode escape characters that appear in the grammar file.
(I also wonder if it might be worth allowing for parsing UTF-32, which requires two more hex digits. Also, is it convention to use only the lowercase u for escaping unicode?)
6. Syntax errors when parsing the java file. After the grammar was loaded I got a heap of syntax errors trying to parse MemoTable.java. A lot of these are comments, and I wonder how many others will resolve once the issue with comments is resolved.