Wield grammar error (the same grammar works in python lark) - Githubissues

jamesrhester / Lerche.jl

A Julia port of the Lark parser

MIT License

45 stars 3 forks source link

Wield grammar error (the same grammar works in python lark) #29

Closed GiggleLiu closed 1 year ago

GiggleLiu commented 1 year ago

Hi, thanks for creating this nice tool. Everything works well except when parsing the following lines

typename : var ("." var)* "." var
funcname : var ("." var)*  ".#" var 
var : /[^\W0-9]\w*/

I see the following error message

julia> p = Lark(read("src/jugsawir.lark", String),parser="lalr",lexer="contextual", start="object");
ERROR: GrammarError("Reduce/Reduce collision in DOT between the following rules: \n\t- <__funcname_star_2:  DOT var>\n\t- <__typename_star_1:  DOT var>")
Stacktrace:
 [1] compute_lalr1_states(l::Lerche.LALR_Analyzer)
   @ Lerche ~/.julia/dev/Lerche/src/parsers/lalr_analysis.jl:325
 [2] compute_lalr!
   @ ~/.julia/dev/Lerche/src/parsers/lalr_analysis.jl:352 [inlined]
 [3] Lerche.LALRParser(parser_conf::Lerche.ParserConf; debug::Bool)
   @ Lerche ~/.julia/dev/Lerche/src/parsers/lalr_parser.jl:47

Do you have any suggestions to resolve the issue?

jamesrhester commented 1 year ago

Thanks for finding this bug. I will need a few days to look at it and track down where the difference with Lark is coming from. You could try assigning different priorities to typename and funcname, which shouldn't make a difference in correct code but the bug may be in the part of the code that works with rule priorities.

GiggleLiu commented 1 year ago

Wield, setting priority does not work at all. Please also check #30

jamesrhester commented 1 year ago

Working on this. The reason for the difference with Lark, and the failure, is that the regular expression for SIGNED_FLOAT is placed before INT in the regular expression list of alternatives for the Lexer by Lark, but the opposite is true for Lerche, and so INT matches first for Lerche. Lark and Lerche both immediately expand out the production for SIGNED_FLOAT into a regular expression, so that LALR grammar reduction is not performed "live". While this may provide a useful speedup for Lark, it is probably unnecessary for Lerche.

jamesrhester commented 1 year ago

So one workaround is to turn all the numeric productions in grammars/common.lark into non-terminals, e.g.

       DIGIT: "0".."9"
       hexdigit: "a".."f"|"A".."F"|"0".."9"

       INT: /[0-9]+/
       signed_int: ["+"|"-"] INT
       decimal: INT "." INT? | "." INT
       _exp: ("e"|"E") signed_int
       float: INT _exp | decimal _exp?
       signed_float: ["+"|"-"] float

jamesrhester commented 1 year ago

Accidentally closed.

jamesrhester commented 1 year ago

The particular issue with floating point productions has been solved in 924c18e. I will now focus on the original issue.

GiggleLiu commented 1 year ago

Nice, thank you so much!

jamesrhester commented 1 year ago

I've tracked down the source of the original bug: identical trees were giving different hash values and therefore equality test was failing. Fix to follow.

jamesrhester commented 1 year ago

Fixed in v0.5.4 (3e241a0)