lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.81k stars 409 forks source link

Lalr parser raises UnexpectedToken('$END', ...) rather than UnexpectedEOF #791

Open zevisert opened 3 years ago

zevisert commented 3 years ago

Describe the bug

When an input is exhausted, the earley parser raises lark.errors.UnexpectedEOF(...), while the lalr parser raises lark.errors.UnexpectedToken('$END', ...).

For consistency sake, in lalr parsers, if the error raised from an unexpected token is '$END' it should be re-raised as UnexpectedEOF.

Some extra context I am building an application that requires parsing a stream, and I had switched to the (_much faster_) lalr parser, but as my stream may require assembling several 'chunks' to create a valid record, I was catching `UnexpectedEOF` from earley, but now I have to catch `UnexpectedToken` and drill into the error to check the token: ```py except lark.exceptions.UnexpectedToken as err: if err.token == lark.Token("$END", ""): logger.debug("Parser expected more data, waiting for another chunk") else: raise err ```

To Reproduce

import sys, lark 
print(f"python: {sys.version_info}\nlark: {lark.__version__}\n\n")

grammar = 'start: "A" ~ 4'  # 4 sequential A's

try:
    lark.Lark(grammar, parser="earley").parse("AA")
except Exception as err:
    print("Earley err:", type(err), *err.args)

try:
    lark.Lark(grammar, parser="lalr").parse("AA")
except Exception as err:
    print("Lalr err:", type(err), *err.args)
Output ```log python: sys.version_info(major=3, minor=8, micro=5, releaselevel='final', serial=0) lark: 0.11.1 Earley err: Unexpected end-of-input. Expected one of: * A Lalr err: Unexpected token Token('$END', '') at line 1, column 2. Expected one of: * A ```
MegaIng commented 3 years ago

Note that this is something that might break compatibility. This is something we have in mind, and I think we also agree that it would be better for both parser to throw the same exception. (Note that this includes the possiblity of making the earley parser throw UnexpectedToken. But you are making a decent case to keeping UnexpectedEOF).

While this is certainly a good change, this might only happen in 1.0. (or we temporary make UnexpectedEOF behave like an UnexpectedToken. But that seems a bit hacky.)

zevisert commented 3 years ago

Yeah this is definitely a breaking change either way, as the different exception types can change the control flow of a program. You've seen my use case, so I would prefer both parsers to raise UnexpectedEOF. That said, there's easy workarounds here until 1.0 lands.

Thanks for the great library!

ThatXliner commented 3 years ago

Yes, consistancy would make error-catching much easier