jamesrhester / Lerche.jl

A Julia port of the Lark parser
MIT License
45 stars 3 forks source link

Undefined behavior when errors in the parser are found #19

Closed ziotom78 closed 3 years ago

ziotom78 commented 3 years ago

The documentation does not mentions what happens when there is an error in the source text passed to the parser. It seems that, similarly to Lark, exceptions are thrown, and these exceptions share the same name as Lark's (UnexpectedInput, UnexpectedCharacters, UnexpectedToken):

julia> j = Lerche.parse(json_parser, raw"{\"key\": [\"item0\", \"item1\", 3.14]]")
Unexpected input: UnexpectedToken(Token(RSQB, ]), ["RBRACE", "COMMA"], 1, 33, nothing, 11, 33, nothing, false)
ERROR: Unexpected token ] at line 1, column 33.
Expected one of: RBRACE, COMMA

Stacktrace:
…

Although Lark's documentation lacks an organic presentation of how error are handled (but it does document UnexpectedToken and UnexpectedCharacters in the documentation for UnexpectedInput), I believe that the usage of exceptions to signal errors and their type should be mentioned somewhere in Lerche's documentation, as this is an important piece of information that helps to ensure robustness in codes.

Disclaimer: this issue was opened following the review happening here: https://github.com/openjournals/joss-reviews/issues/3497.

erezsh commented 3 years ago

@ziotom78 If you don't mind me asking, what do you mean by an organic presentation of how errors are handled?

Anyway, I improved the documentation on exceptions a little. Thanks for pointing that out. (https://lark-parser.readthedocs.io/en/latest/classes.html#lark.Lark.parse)

ziotom78 commented 3 years ago

@ziotom78 If you don't mind me asking, what do you mean by an organic presentation of how errors are handled?

No worry! I meant that a reader would like to find something better than what's inside Lark's documentation. I had to dig a bit to discover how errors are handled, and I managed to find it because I tested with some code that UnexpectedToken was raised whenever a token was in the wrong place and was then able to search for this keyword in the docs. It would have been far better to have a dedicated section in the documentation or (as is your patch) in the docstring for parse.

As I said, your patch looks good and definitely useful, so I'll close this issue.

ziotom78 commented 3 years ago

Sorry, I'm reopening this because I realized that the docs you linked above refer to Lark and not to Lerch.jl. Is the improvement in the documentation already available somewhere? The last commit I see in master dates back to Jun 18.

erezsh commented 3 years ago

I will leave that to James to answer. But I believe Lerche's behavior should match Lark's in this case.

jamesrhester commented 3 years ago

I have updated the Lerche.jl documentation in c26641d111f87a710add563a1b68da5137f21c3d by adding to the docstring for parse (as for Lark), and including the exceptions in the API docs. There is also now a short paragraph in the top-level README.md.

ziotom78 commented 3 years ago

Great, thanks!