Closed hniksic closed 5 years ago
currently the attributes of a lark.exceptions.UnexpectedCharacters don't seem to provide enough details to craft a user-friendly exception
Why not? Which information is missing?
Here are the attributes currently existing:
For example, in case of bla
, the allowed
attribute contains set([u'ENDLINE', u'HELLO', u'__IGNORE_0'])
. The first two will be meaningful to the end user who is not familiar with the formal grammar of the DSL, and the last one appears auto-generated.
The name is auto-generated. If you want it to have a better name, you can give it one in the grammar.
You can also create your own error message, instead of using Lark's, which is intended at developers.
I'll do that, thanks. I guess the auto-generated name comes from the %ignore
directive.
If you want to keep this issue, the idea would be for Lark to provide a user-facing error message without referring to the names from the grammar, something like "Expected one of hello
, \n
or ;
". If this is not feasible or desirable, feel free to close it.
It would be better for the users of the tools, but not for those who write the grammars.
But thanks for the suggestion.
This could be resolved by the message beginning with the "user" part and then following up with additional information for the grammar developer. The latter would be easy to cut off when providing the error message to the end user.
(My use case is a large desktop application where the end user is not aware that the program is written using Lark, or even Python.)
I don't want to encourage lazy programming.
The correct way to do it is create your own error message using the exception's attributes. There is even a get_context
method to get the same formatting Lark.
There is a related problem with the UnexpectedToken
exception (and a more complex grammar). This exception as an expected
attribute, but I'm not sure how to interpret it. The message looks like this:
Unexpected token Token($END, '') at line 1, column 1.
Expected one of:
* FILE
* HEADER
* __ANON_2
* COLUMNS
* SEPARATOR
* ENDLINE
The expected
attribute contains [u'FILE', u'HEADER', '__ANON_2', u'COLUMNS', u'SEPARATOR', 'ENDLINE']
.
It seems that the grammar rules are uppercased, which mixes them with lexical terminators. Is that intentional?
More importantly, I'm not sure how to find out what __ANON_2
refers to. Is it a case of me forgetting to name a terminator?
They are called terminals.
expected
only contains terminals. Why do you think it's wrong?
Anonymous tokens are created by using terminals in rules, without defining a name for them. for example:
dict: "{" items "}"
The curly braces are anonymous. If you want to make sure they'll have a name, you have to define:
LCURLY: "{"
etc.
Maybe I am misunderstanding something. In the above exception expected
seems to contain a mixture of rules (lower-case in the grammar, but upper-cased in expected
) and terminals (upper-case in both the grammar and expected
). The grammar doesn't contain an entry called FILE
(all-caps), for example, but it's listed. Is that intentional?
Also, the expected
list contains a mixture of byte and unicode strings - not a problem, just mentioning in case it's relevant.
seems to contain a mixture of rules (lower-case in the grammar, but upper-cased in expected)
If you can show this to be true, it would be considered a bug.
You should probably take into consideration that Lark automatically names anonymous terminals when it makes sense. So a string of "file"
will be automatically named FILE
if there is no collision.
Re mixed byte & unicode, it's no big deal, but definitely not the intended behavior. If you provide a small example showing this, then I'll fix it.
Thanks, that clears the confusion. The issue was that I have a rule called file
which happens to refer to an anonymous terminal file
, like this:
file: "file" file_pattern ENDLINE
As noted, the string "file"
was automatically named FILE
, which is why such string ended up in the error message.
I'll report the byte/unicode thing in a separate issue as soon as I reduce it to a minimal example.
I would like to use lark to build a user-facing parser with nice error messages. (Not fancy error recovery, just readable error messages.) Consider, for example, this grammar and script:
When fed the invalid string
hello
followed by newline, the parser emits the following error message:The Python set syntax seems out of place in the error message. More importantly, the end user has no way to interpret what to make of
u'__IGNORE_0'
.Likewise, when fed random text like
bla
, the error says:Is it possible to customize the errors so they don't include names of lexer terminals like
ENDLINE
or__IGNORE_0
?Note that I am perfectly fine with writing additional code to convert lark exceptions to more user-friendly exceptions, but currently the attributes of a
lark.exceptions.UnexpectedCharacters
don't seem to provide enough details to craft a user-friendly exception.