Open erezsh opened 6 years ago
Any update?
I have a version that should be working for LALR, in branch end_symbol
If anyone wants to test it and let me know how it goes, that would be great!
(that's https://github.com/lark-parser/lark/tree/end_symbol)
@isidentical @jruales @coreyt
I did a rebase of this onto the master branch. There was a conflict in lalr_parser.py
that I'm not sure I resolved correctly.
https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660
I but all the tests seem to pass. To be sure I added another test in and it failed
https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660
======================================================================
ERROR: test_end_symbol2 (tests.test_parser._make_parser_test.<locals>._TestParser)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/larpy/lark/tests/test_parser.py", line 1659, in test_end_symbol2
self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
File "/Users/larpy/lark/lark/lark.py", line 293, in parse
return self.parser.parse(text, start=start)
File "/Users/larpy/lark/lark/parser_frontends.py", line 88, in parse
return self._parse(token_stream, start)
File "/Users/larpy/lark/lark/parser_frontends.py", line 54, in _parse
return self.parser.parse(input, start, *args)
File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 36, in parse
return self.parser.parse(*args)
File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 110, in parse
reduce(arg)
File "/Users/larpy/lark/lark/parsers/lalr_parser.py", line 77, in reduce
value = self.callbacks[rule](s)
File "/Users/larpy/lark/lark/parse_tree_builder.py", line 28, in __call__
res = self.node_builder(children)
File "/Users/larpy/lark/lark/parse_tree_builder.py", line 121, in __call__
filtered.append(children[i])
IndexError: list index out of range
----------------------------------------------------------------------
I need to dive deep to get a better understanding of what is going on here exactly, but meanwhile does anything look off about the code I linked or the test I wrote?
The test seems fine. I think the exception happens because some of the new preprocessing code isn't aware that $
isn't a real terminal (in the sense that it doesn't match any text)
I'll look into it soon, unless you'll manage to work it out sooner.
I've done a bit more digging
2 things that may or may not be relevant
1st: above test fails with the same error on the original branch (before rebase)
2nd: Altering the test slightly produces a different error:
@unittest.skipIf(PARSER!='lalr', "Using the end symbol currently works for LALR only")
def test_end_symbol2(self):
grammar = """
start: (a|b)+
a: "a" (E|ND)
b: "b"
E: $
ND: "x"
"""
parser = _Lark(grammar)
self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
self.assertRaises(UnexpectedInput, parser.parse, 'ab')
======================================================================
FAIL: test_end_symbol2 (tests.test_parser._make_parser_test.<locals>._TestParser)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/larpy/lark/tests/test_parser.py", line 1659, in test_end_symbol2
parser = _Lark(grammar)
File "/Users/larpy/lark/tests/test_parser.py", line 535, in _Lark
return Lark(grammar, lexer=lexer_class_or_name, parser=PARSER, propagate_positions=True, **kwargs)
File "/Users/larpy/lark/lark/lark.py", line 206, in __init__
self.terminals, self.rules, self.ignore_tokens = self.grammar.compile(self.options.start)
File "/Users/larpy/lark/lark/load_grammar.py", line 494, in compile
for name, (term_tree, priority) in term_defs if term_tree]
File "/Users/larpy/lark/lark/load_grammar.py", line 494, in <listcomp>
for name, (term_tree, priority) in term_defs if term_tree]
File "/Users/larpy/lark/lark/lexer.py", line 69, in __init__
assert isinstance(pattern, Pattern), pattern
AssertionError: None
----------------------------------------------------------------------
In the rebase case this fails in a different way, because an assert has been added
https://github.com/lark-parser/lark/blob/master/lark/load_grammar.py#L637
removing the assert and it fails in the same way.
Hope these findings are helpful. About to go vacation so probably won't get back to looking at this for a couple weeks at least
Hi @jisaacstone,
I just created a new branch, end_symbol3
, which merges end_symbol
with the most recent master.
I don't get the same exceptions as you. In fact, everything works perfectly for me.
I also added the two tests you outline here (renamed the 2nd version to test_end_symbol3
, with a few corrections), and they all seem to pass.
Let me know if you think I missed something. Otherwise, all you have left to do, is to get it to work for Earley.
Enjoy your vacation!
OK Looks like my original thought was correct and this section of code i did not resolve the conflicts correctly
https://github.com/jisaacstone/lark/blob/end_symbol/tests/test_parser.py#L1650-L1660
Should ought to have gone back to that when I hit trouble. Sorry for the misdirection
+1 for $ to represent end of string. I just ran into a need for this.
Just pushed this into master
. You can now use $
in the LALR parser. (No Earley support yet)
Here's an example from the tests that shows what it does:
def test_end_symbol2(self):
grammar = """
start: (a|b)+
a: "a" ("x"|$)
b: "b"
"""
parser = _Lark(grammar)
self.assertEqual(parser.parse('axa'), Tree('start', [Tree('a', []),Tree('a', [])]))
self.assertRaises(UnexpectedInput, parser.parse, 'ab')
I've just checked master
and I can't find test_end_symbol2
and the example grammar listed above doesn't import.
E lark.exceptions.GrammarError: Unexpected input at line 2 column 13 in calculator/grammar.lark:
E
E a: "a" ("x"|$)
E ^
lark/lark/load_grammar.py:890: GrammarError
I don't claim to be even remotely be an expert on parsing but not being able to match $
seems to make even the simplest of grammars unparseable.
For instance (simplified):
start: expr+
expr: expr OPER expr
| SIGNED_NUMBER
OPER: "+" | "-" | "*" | "/"
parse("1-1") # Tree('start', [Tree('statement', [1.0]), Tree('statement', [-1.0])])
whereas start: (expr ("\n"|$))+
would actually pick expr oper expr
Looks like you reverted in 3bee21051e8440e506ea13f45b224e2f6d668662 but you don't say why.
If I remember correctly, there were some problems with unexpected side effects.
I don't claim to be even remotely be an expert on parsing but not being able to match $ seems to make even the simplest of grammars unparseable.
Maybe, but your example is wrong:
start: expr ("\n" expr?)*
works.
So it does... again, not an expert, but that's very non intuitive.
Thanks for the help!
@kneufeld I reverted it because it was buggy.
I agree it would be a nice feature, but I don't think it's a necessity.
Is there any update? I kind of need this. Or is there a workaround?
I'll see what I can do
@ThatXliner See PR #880
Can be signified with
$
Must be optional (grammars don't have to contain it to be correct)
Need to make sure it works for both LALR and Earley, and with indentation.