[Question] Why is my npeg parser done, then starts backtracking and fails.

thatrandomperson5 commented 1 year ago

I have this line of my parser: tinput <- tblock * !1 after EOL it should check again for the !1 and end but from the trace below you can see it backs up and starts again then fails. Why is this happening and how can i fix it?

 75|  0| 94|                        |EOL            |chr "\r"                                |***
 79|  0| 94|                        |EOL            |chr "\n"                                |***
 82|  0| 94|                        |EOL            |chr "\r"                                |***
 85|  0| 94|                        |EOL            |any                                     |***
 89|  0| 94|                        |tcontent       |return                                  |***
 93|  0| 94|                        |tblock         |commit 91                               |***
 91|  0| 94|                        |tblock         |choice 94                               |**
 92|  0| 94|                        |tblock         | call tcontent:9                        |***
  9|  0| 94|                        |tcontent       |choice 60                               |***
 10|  0| 94|                        |Static         | capopen ckAction -> 94                 |****
 11|  0| 94|                        |Static         |  capopen ckVal -> 94                   |****
 12|  0| 94|                        |whitespace     |   span {'\t',' '}                      |****
 13|  0| 94|                        |Static         |  capclose ckVal -> 94                  |****
 14|  0| 94|                        |Static         | capclose ckAction -> 94                |****
 15|  0| 94|                        |Alpha          | set {'A'..'Z','a'..'z'}                |****
 95|  0| 94|                        |               |opFail (backtrack)                      |****
 60|  0| 94|                        |Static         |capopen ckAction -> 94                  |***
 61|  0| 94|                        |Static         | capopen ckVal -> 94                    |***
 62|  0| 94|                        |whitespace     |  span {'\t',' '}                       |***
 63|  0| 94|                        |Static         | capclose ckVal -> 94                   |***
 64|  0| 94|                        |Static         |capclose ckAction -> 94                 |***
 65|  0| 94|                        |Alnum          |set {'0'..'9','A'..'Z','a'..'z'}        |***
 95|  0| 94|                        |               |opFail (backtrack)                      |***
 94|  0| 94|                        |tblock         |return                                  |**
 48|  0| 94|                        |tcontent       | choice 59                              |**
 49|  0| 94|                        |tcontent       |  choice 57                             |***
 50|  0| 94|                        |Dedent         |   capopen ckAction -> 94               |****
 51|  0| 94|                        |Dedent         |    capopen ckVal -> 94                 |****
 52|  0| 94|                        |whitespace     |     span {'\t',' '}                    |****
 53|  0| 94|                        |Dedent         |    capclose ckVal -> 94                |****
 54|  0| 94|                        |Dedent         |   capclose ckAction -> 94              |****
 95|  0| 94|                        |               |opFail (backtrack)                      |****
 57|  0| 94|                        |tcontent       | commit 58                              |***
 58|  0| 94|                        |               |opFail (backtrack)                      |**
 60|  0| 51|Fooo:\n    Barr\n    Fo |Static         |capopen ckAction -> 51                  |*
 61|  0| 51|Fooo:\n    Barr\n    Fo |Static         | capopen ckVal -> 51                    |*
 62|  0| 51|Fooo:\n    Barr\n    Fo |whitespace     |  span {'\t',' '}                       |*
 63|  0| 51|Fooo:\n    Barr\n    Fo |Static         | capclose ckVal -> 51                   |*
 64|  0| 51|Fooo:\n    Barr\n    Fo |Static         |capclose ckAction -> 51                 |*
 65|  0| 51|Fooo:\n    Barr\n    Fo |Alnum          |set {'0'..'9','A'..'Z','a'..'z'}        |*
 66|  0| 52|ooo:\n    Barr\n    Foo |value          |capopen ckAction -> 52                  |*
 67|  0| 52|ooo:\n    Barr\n    Foo |value          | capopen ckVal -> 52                    |*
 68|  0| 52|ooo:\n    Barr\n    Foo |value          |  span {'0'..'9','A'..'Z','a'..'z'}     |*
 69|  0| 55|:\n    Barr\n    Foooo: |value          | capclose ckVal -> 55                   |*
 70|  0| 55|:\n    Barr\n    Foooo: |value          | chr ":"                                |*
 71|  0| 56|\n    Barr\n    Foooo:  |               | jump :73                               |*
 73|  0| 56|\n    Barr\n    Foooo:  |               |opFail (backtrack)                      |*
  4|  0| 51|Fooo:\n    Barr\n    Fo |tinput         |any                                     |
  5|  0| 52|ooo:\n    Barr\n    Foo |               |jump :7                                 |
  7|  0| 52|ooo:\n    Barr\n    Foo |               |opFail (error)                          |

zevv commented 1 year ago

Hard to tell without seeing your grammar, any chance you could post a minimal self contained snippet of nim code + grammar that shows this behavior?

thatrandomperson5 commented 1 year ago

I have no idea where the issue lies so it is hard to make a minimal self contained snippet. I will try tho

thatrandomperson5 commented 1 year ago

Is this good enough?

  let testParser = peg("tinput", o: seq[OC]):
    name <- >(+Alpha) * ":":
      o.add ("name", $1, indent())
    value <- >(+Alnum) * !":":
      o.add ("value", $1, indent())
    tcontent <-  (ib.Line(name) * ib.Block(tblock)) | ib.Line(value)
    tblock <- +tcontent
    tinput <- tblock * !1

It’s minimal but not self contained

zevv commented 1 year ago

I'd like to help you debug this, but [lease try to come up with something I can run to inspect what is happening; I can see from the trace that NPeg is backtracking so your input does not match the grammar, but I can not tell what is wrong with your grammar without having your grammar and the subject string you are trying to parse.

thatrandomperson5 commented 1 year ago

This is self contained but not minimal:

  var indentStack = newSeq[int]()

  template top[T](s: seq[T]): T = 
    if s.high == -1:
      0
    else:
      s[s.high]

  grammar "ib":
    EOL <- "\r\n" | "\n" | "\r" | !1
    whitespace <- *Blank

    Indent <- >whitespace: 
        validate len($0) > indentStack.top
        indentStack.add len($0)
    Dedent <- >whitespace:
        let delStack = len($0)
        validate delStack < indentStack.top
        while delStack < indentStack.top:
            discard indentStack.pop
    Static <- >whitespace:
        validate len($0) == indentStack.top
    Line(content) <- ib.Static * content * ib.EOL
    Block(content) <- &ib.Indent * content * &ib.Dedent

  type OC = tuple[kind: string, s: string, indent: int]
  let testParser = peg("tinput", o: seq[OC]):
    name <- >(+Alpha) * ":":
      o.add ("name", $1, indentStack.len)
    value <- >(+Alnum) * !":":
      o.add ("value", $1, indentStack.len)
    tcontent <-  (ib.Line(name) * ib.Block(tblock)) | ib.Line(value)
    tblock <- +tcontent
    tinput <- tblock * !1

  var o = newSeq[OC]()
  const testStr = """Foo:
    FooBar:
        Bar
        Bar2
    Bar3
Fooo:
    Barr
    Foooo:
        Bar4
Bar5"""
  echo testStr
  let r = testParser.match(testStr, o)
  echo o
  doAssert r.ok

thatrandomperson5 commented 1 year ago

@zevv can you run the above successfully?

zevv commented 1 year ago

The parser parses ok, but your Dedent rule fails because of validate delStack < indentStack.top, causing the whole shebang to backtrack to the start.

thatrandomperson5 commented 1 year ago

You were right, just needed to change to this: Block(content) <- &ib.Indent * content * (&ib.Dedent | !1). Thanks!

zevv / npeg

[Question] Why is my npeg parser done, then starts backtracking and fails. #53