lukehutch / pikaparser

The Pika Parser reference implementation
MIT License
141 stars 12 forks source link

Handling of OneOrMore in getSubClauseMatches in Match.java #6

Closed psfblair closed 4 years ago

psfblair commented 4 years ago

In Match.java in the method public List<Entry<String, Match>> getSubClauseMatches() I see the handling of OneOrMore:


if (memoKey.clause instanceof OneOrMore) {
    // Flatten right-recursive structure of OneOrMore parse subtree
    var subClauseMatchesToUse = new ArrayList<Entry<String, Match>>();
    for (Match curr = this; curr.subClauseMatches.length > 0; curr = curr.subClauseMatches[1]) {
        subClauseMatchesToUse.add(new SimpleEntry<>(curr.memoKey.clause.labeledSubClauses[0].astNodeLabel,
                curr.subClauseMatches[0]));
        if (curr.subClauseMatches.length == 1) {
            // Reached end of right-recursive matches
            break;
        }
    }
    return subClauseMatchesToUse;
}

It is unclear to me what the loop is doing here. It looks like you're always taking curr.subClauseMatches[0] and curr.memoKey.clause.labeledSubClauses[0] and then you hit the break statement.

I don't see that either of the arrays is being modified within the loop. Could this not be replaced simply by

if (this.subClauseMatches.length > 0) {
     subClauseMatchesToUse.add(
          new SimpleEntry<>(
                curr.memoKey.clause.labeledSubClauses[0].astNodeLabel,
                curr.subClauseMatches[0]
          )
     )
}
psfblair commented 4 years ago

I missed the modification in the increment part of the for loop.

lukehutch commented 4 years ago

No problem, I really appreciate another set of eyes on the code!

Yeah, this code just follows the linked list formed by the right-recursive structure of the OneOrMore operator, and copies each linked list element into the ArrayList, along with the AST node label (if any).

You can see how that structure is built in the OneOrMore.match method. It's similar to but not exactly the same as a Lisp list -- I decided to terminate the list with a one-element node, rather than a zero-element node, i.e. (A B C D) would be turned into right-recursive form as (A (B (C (D)))) rather than (A (B (C (D ())))).

The paper explains why OneOrMore is rendered into right-recursive form.