lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.77k stars 404 forks source link

Resolution order changed #1456

Open Conchylicultor opened 3 weeks ago

Conchylicultor commented 3 weeks ago

Our project was working well in past version, but seems to be broken with the current lark version. Reproduction:

import lark

GRAMMAR = """
?start: shape

shape: (dim (" " dim)*)?

// TODO: Add expressions (+, /, *, -)
?dim: UNKNOWN_DIM
    | ELLIPSIS_DIM
    | named_dim
    | STATIC_DIM
    | var_dim

var_dim: "*" CNAME
UNKNOWN_DIM: "_"
ELLIPSIS_DIM: "..."
named_dim: CNAME
STATIC_DIM: INT

// Defined in `lark/grammars/common.lark`
%import common.CNAME
%import common.INT
"""

parser = lark.Lark(GRAMMAR)
print(parser.parse('_ 3 n'))

As you can see, _ is now parsed as named_dim, rather than UNKNOWN_DIM. But the grammar define UNKNOWN_DIM before named_dim so I would expect the resolution order to match the code.

We tried with ambiguity='resolve' but this didn't changed anything.

Why did the resolution order changed ? Is there a param to fix the issue ?

erezsh commented 3 weeks ago

Hello @Conchylicultor ,

Can you please check if this PR fixes your issue?

https://github.com/lark-parser/lark/pull/1451

erezsh commented 3 weeks ago

To your question -

Why did the resolution order changed ?

Hard to say, but I imagine it probably happened between 0.12.x and 1.0.0

We made a lot of improvements to the Earley parser (and we still do), and it's possible that the order of the derivations change. (though we try to keep that to a minimum)

Is there a param to fix the issue ?

Usually, using a priority is the easiest way to choose between derivations. ( .e.g. preferred_rule.100: subrule1 subrule2 .. )

Also consider using ambiguity='explicit' and choosing the correct derivations on your side.