igordejanovic / parglare

A pure Python LR/GLR parser - http://www.igordejanovic.net/parglare/
MIT License
135 stars 32 forks source link

Either not parsing correctly or I misunderstand something #114

Closed stuartlangridge closed 4 years ago

stuartlangridge commented 4 years ago

Description

I'm using the following code. The key issue seems to be the optional KindWith?, which seems to break parsing of a simple sentence? Perhaps the code will make it clearer:

from parglare import Grammar, GLRParser

grmr = """
Sentence: KindDefinitionSentence | OtherSentence;
KindDefinitionSentence: "a" IdentifierWord* "is" "a" "kind" "of" IdentifierWord* KindWith? DOT;
OtherSentence: IdentifierWord* DOT;
KindWith: "with" IdentifierWord*;

terminals

IdentifierWord: /\w+/;
DOT: ".";
"""

def parse(s):
    print(parser.parse(s))

grammar = Grammar.from_string(grmr)
parser = GLRParser(grammar, actions={
    "OtherSentence": lambda c, m: "Other (failure)",
    "KindDefinitionSentence": lambda c, m: "Kind (success)"
})
# What we WANT to happen is that this is parsed as a KindDefinitionSentence.
# But it is not: it is parsed as an OtherSentence.
# So the output is "Other (failure)" when it should be "Kind (success)".
parse("a car is a kind of vehicle.")
# This works OK, though, and returns two parses, one of which is
# a KindDefinitionSentence as expected.
parse("a car is a kind of vehicle with wheels.")

I think that "a car is a kind of vehicle" ought to match a KindDefinitionSentence (the KindWith part is not present, but it is explicitly optional). When using "a car is a kind of vehicle with wheels." then it parses correctly (GLRParser returns two possible parses, one of which is the KindDefinitionSentence as expected). Perhaps I have misunderstood what an "optional" item is, but since KindWith is optional, I would have expected that a sentence without that part would still parse correctly, and it doesn't.

The code above outputs:

['Other (failure)']
# that is, "a car is a kind of vehicle." only parses as an OtherSentence
['Kind (success)', 'Other (failure)']
# that is, "a car is a kind of vehicle with wheels." parses successfully as either.

Am I writing the grammar wrong somehow?

igordejanovic commented 4 years ago

Your assumption is correct. It should return both results but the currently implemented solution to deal with empty reductions is too restrictive and some valid solutions gets dropped. In this case the algorithm prefers non-empty solution over empty and thus returns only OtherSentence.

This is related to #112

I'm working on a fix.

igordejanovic commented 4 years ago

It is fixed on the master branch. It should now correctly handle all context-free grammars. You can see the regression test for your issue

Notice that there are 2 valid solution for your first sentence and 3 valid solution for the second.

Thanks for the contribution. I'm closing this issue. Feel free to open if you notice anything wrong.