scymtym / esrap

Common Lisp packrat parser
https://scymtym.github.io/esrap/
78 stars 12 forks source link

Get "start position" of failed parse rule #2

Closed mister-walter closed 1 year ago

mister-walter commented 5 years ago

Not sure how possible this is, but it would be helpful to be able to get the position at which a failed parse rule began. For example, given the rules:

(defrule natural (+ (or "0" "1" "2" "3" "4" "5" "6" "7" "8" "9"))
  (:text t))

(defrule numlist (and #\( natural (* (and ", " natural)) #\)) (:text t))

the following line invokes an error:

(parse 'numlist "(1, 2, 3")
debugger invoked on a ESRAP:ESRAP-PARSE-ERROR in thread
#<THREAD "main thread" RUNNING {10005305B3}>:
  At end of input

  (1, 2, 3
          ^ (Line 1, Column 8, Position 8)

In context NUMLIST:

While parsing NUMLIST. Expected:

     the character ) (RIGHT_PARENTHESIS)
  or the string ", "
  or the character 0 (DIGIT_ZERO)
  or the character 1 (DIGIT_ONE)
  or the character 2 (DIGIT_TWO)
  or the character 3 (DIGIT_THREE)
  or the character 4 (DIGIT_FOUR)
  or the character 5 (DIGIT_FIVE)
  or the character 6 (DIGIT_SIX)
  or the character 7 (DIGIT_SEVEN)
  or the character 8 (DIGIT_EIGHT)
  or the character 9 (DIGIT_NINE)

In this case, it's easy to see where the problem lies because it's all one line, and there's no nesting. However, in cases where the list spans multiple lines, it's often more useful to see the start of the offending list.

I poked around a little bit but couldn't see any built-in way to get this information (the start position of the rule that ended up failing).

scymtym commented 5 years ago

Thank you for the suggestion.

I decided against recording the start position of every failed parse for performance reasons.

Maybe something can be done for the innermost failed rule. I'll think about it and report back.

mister-walter commented 5 years ago

Just a heads up - after playing around I discovered I can get this information already by digging into the second level result-detail of the error. This contains a list of the results for the sequence rule, including other successful parses that happened before the failed parse. From this I can get the position of the sequence's "start" character, and I can check that it was parsed successfully.

(esrap::result-detail (esrap::result-detail (esrap:esrap-parse-error-context (handler-case (parse 'numlist "(1, 2, 3") (error (c) c)))))
=>
(#<ESRAP::SUCCESSFUL-PARSE "(" @1> #<ESRAP::SUCCESSFUL-PARSE NATURAL @2>
 #<ESRAP::SUCCESSFUL-PARSE # @8> #<ESRAP::FAILED-PARSE ")" @8>)