lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.89k stars 414 forks source link

Better token formatting available? #836

Open pmiddend opened 3 years ago

pmiddend commented 3 years ago

What is your question?

I have a Lark grammar that has the following production:

!inneroperator : "has" | "=" | "<" | ">" | "<=" | ">=" | "!="

(The whole grammar should parse statements like foo < 3, pretty simple stuff actually)

When I give parse something invalid, such as foo 3, the error message reads:

No terminal defined for '3' at line 1 col 8
foo 3
Expecting: {'EQUAL', 'HAS', 'MORETHAN', '__ANON_1', '__ANON_0', '__ANON_2', 'LESSTHAN'}

This is already quite readable, and I'm impressed. However, I don't know what these ANON things are, and also I would ideally output = instead of EQUAL. Is that possible?

MegaIng commented 3 years ago

This is a drawback of the earley parse at the moment. If you use lalr instead (which you grammar should be fine for), you already get nicer messages.

You can also prevent this by naming the Terminals:

!inneroperator : "has" | EQ | NE | LT | LE | GT | GE

EQ: "="
NE: "!="
LT: "<"
LE "<="
GT: ">"
GE: ">="
pmiddend commented 3 years ago

Basically the same error message with the lalr parser. I tried naming the terminals, but neither EQUAL nor EQ make it clear to the user that it's = that's expected (I mean, it could be ==, too).

erezsh commented 3 years ago

There has already been some talk about improving the error messages by providing the expected values, so I will keep that in mind as a possible task.

Another easier fix we could do, is include default names for >= and <= and !=, as they are common enough to the landscape.

pmiddend commented 3 years ago

Aside from that (and it's not an easy problem, of course), what are the ANON tokens mentioned?

erezsh commented 3 years ago

When a terminal doesn't have a name (i.e. defined as "foo"), and Lark can't guess the name, it calls it ANON.