nextjournal / lezer-clojure

This is a Clojure grammar for the lezer parser system.
ISC License
24 stars 3 forks source link

failing: parse unmatched closing brackets #9

Closed mhuebert closed 2 years ago

mhuebert commented 4 years ago

failed attempt to allow unmatched closing brackets to be parsed. Results in error Inconsistent skip sets after "#_" "[" "]"

marijnh commented 4 years ago

I suspect it'll help if you exclude the stray closing tokens from the expressions allowed after #_ (possibly by splitting expression into realExpression and expression { realExpression | "}" | "]" | ")" })

mhuebert commented 4 years ago

Thanks for the tip!

I suspect it'll help if you exclude the stray closing tokens from the expressions allowed after #_ (possibly by splitting expression into realExpression and expression { realExpression | "}" | "]" | ")" })

Trying that here, I get the same error - https://github.com/lezer-parser/clojure/pull/9/commits/c26129667a080b9f88c9c2557969fa4c5098e962.

It seems related to the nesting of these closing brackets... in https://github.com/lezer-parser/clojure/pull/9/commits/9c8b70055dba758a1005b5f8335f62d419235101, Discard accepts maybeInvalidExpression at the top level (no error) but this is no longer the desired grammar because closing-brackets are no longer included in expression recursively.

marijnh commented 4 years ago

Ah, right, i misdiagnosed the issue. What's I think is happening is that, because at that level the analysis can't see that the precedence will prevent this, after #_() the parse might be after a skipped expression (which is a state in the skip rule, so it does not itself skip content), or it might still be in the list, having just skipped a stray closing paren, at which point the regular global skip rules apply. So that state wouldn't know what to do with, say, a whitespace token.

What is the reason for encoding invalid syntax in the grammar?

mhuebert commented 4 years ago

What is the reason for encoding invalid syntax in the grammar?

I've been implementing auto-close / bracket-matching, with some differences from built-in versions. To the extent possible I'm trying to implement it strictly in terms of the parse tree, without backing off to reading the doc as a string.

I'm using unmatched open-brackets in a couple of ways -

If closing-brackets are present in the tree even when invalid, they are useful for similar navigations. Eg, when inserting (, we can first check for an unmatched close-bracket among the right-siblings of the current node, and in that case insert a ( to create a balanced collection from the unmatched bracket we found, instead of inserting ().