lark-parser / lark

Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
MIT License
4.62k stars 395 forks source link

Fix SymbolNode.end for completed tokens #1432

Closed chanicpanic closed 1 week ago

chanicpanic commented 1 week ago

Fixes #1431.

I realized that that bug helped #1427 work: multiple solutions had different end values and were considered distinct. After fixing the bug, the end values of solutions are the same, and the solutions get deduped being considered equal. However, we know that the solutions have different children and should be considered distinct. For simplicity, I updated SymbolNode to use the default identity-based equality. Let me know if you have any other ideas for its __eq__ and __hash__ implementations.

erezsh commented 1 week ago

LGTM, but my understanding of this code is limited. What do you think are the chances that this will create duplicate solutions?

chanicpanic commented 1 week ago

I think the chances are low. I would hope that if duplicate solutions could be created, it would have happened for some of the existing tests causing them to fail.

From my (imperfect) understanding of the code, I believe that there cannot be more than two different solution nodes at the end:

  1. One for completed start rule alternatives that ended in a non-terminal
  2. One for completed start rule alternatives that ended in a terminal

Within those two groups, ambiguities are handled as usual and SymbolNode creation is limited by the node_caches.

erezsh commented 1 week ago

Thanks!