ohmjs / ohm

A library and language for building parsers, interpreters, compilers, etc.
MIT License
4.97k stars 219 forks source link

Indentation-sensitive grammar forgets consumed dedents #465

Open simonwo opened 8 months ago

simonwo commented 8 months ago

Hi, I'm using Ohm v17.1.0 to build a whitespace-sensitive grammar using the experimental indentation support. I am struggling to get the following MVE grammar to correctly parse the example:

Blocks <: IndentationSensitive {
    Object = Block<ObjectHead, ObjectBody>
    ObjectHead = "key:"
    ObjectBody = ObjectHead "value"

    Block<Head, Body> = Head newline indent BlockBody<Head, Body> dedent
    BlockBody<Head, Body> = Block<Head, Body> BlockBody<Head, Body> -- block
                          | Body newline BlockBody<Head, Body>  -- stmt                          
                          | space* -- zero

    newline = "\n" "\r"?    
    space := " " | "\t"
}

And the example:

key:
    key:
        key: value
        key: value
    key: value

The parse fails at line 5 col 5 with Expected a dedent, a space, or "key:", which is weird because it should be able to consume a dedent, or key: if it has just consumed one. After banging my head against the desk for a while, I've stepped through the parsing code and whilst it does consume the dedent, when it returns to the same position to continue the parse it seems to still be expecting one. AFAICS the result of attempting to parse at this position is memoized when it should not be, because the memoized version does not take into account the consumed dedent.

Specifically, if I comment out this line which resets memoization for a match state that has it enabled https://github.com/ohmjs/ohm/blob/39ccead882ad35dad73d99dea854ff60e7146291/packages/ohm-js/src/pexprs-eval.js#L277-L279 ...the parse works as expected.

However, I'm not really sure why this line is here so not sure if I've found a bug or have just two-wrongs-make-a-righted myself.

I'd be happy to work on and submit a patch for this, but I could do with a positive confirmation that this is a problem, and pointers to any other areas that I might need to change as result. No doubt it's all more complicated than I understand at the moment! Thanks!

simonwo commented 6 months ago

FYI this patch solves the issue for me: 01-keep-memoization-disabled.patch

And this one provides some more debugging information: 02-output-indents-in-trace.patch