lukehutch / pikaparser

The Pika Parser reference implementation
MIT License
141 stars 12 forks source link

Escaped characters do not match #4

Closed wrandelshofer closed 4 years ago

wrandelshofer commented 4 years ago

I have made the following grammar description:

Program <- Move (S* Move)*;
Move <- "R"/"U"/"F"/"L"/"D"/"B";
S <- " "/"\n";

I expected, that this grammar would be able to fully parse the following input String:

input = "R\nB";

However the returned MemoTable only contained matches for "R" and "B", but not for the entire sequence "R\nB".

I noticed that MetaGrammar instantiates a CharSeq object with the following argument for parameter str:

"\nn";

Notice, that there is an extra 'n' character after '\n'.

I suspect, that there is a problem in method unescapeString of class MetaGrammar. It looks like, method unescapeString only consumes the backslash, but not the character that follows after the backslash.

I was able to get the desired result, by adding the line "i++; // consume escaped character" to method unescapeString.

    private static String unescapeString(String str) {
        StringBuilder buf = new StringBuilder();
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            if (c == '\\') {
                if (i == str.length() - 1) {
                    // Should not happen
                    throw new IllegalArgumentException("Got backslash at end of quoted string");
                }
                buf.append(unescapeChar(str.substring(i, i + 2)));
                i++; // consume escaped character
            } else {
                buf.append(c);
            }
        }
        return buf.toString();
    }
lukehutch commented 4 years ago

This wasn't thoroughly tested before -- thanks for the bug report, and for digging in to find the fix! Committed to master.