custom error messages? - Githubissues

it would be nice to have an option in the defrule macro to add custom error messages.

in the case of complex rules there might be a domain-specific way of describing them that is simpler than the automatically generated one, and in the rules where we have internal logic we don't want to disclose (e.g., I have a rule that I publish as being stricter as it really is, like requiring one space where the actual rule will accept any number of spaces or tabs).

do you think this is worthwhile?

ps: I notice lots of places (like the awesome-cl list used to until today) still link to https://github.com/nikodemus/esrap, even though this is the current quicklisp distribution; if you have nikodemus blessing, it would be nice to have him write something on the readme of his fork so that people are aware!

it would be nice to have an option in the defrule macro to add custom error messages.

in the case of complex rules there might be a domain-specific way of describing them that is simpler than the automatically generated one, and in the rules where we have internal logic we don't want to disclose (e.g., I have a rule that I publish as being stricter as it really is, like requiring one space where the actual rule will accept any number of spaces or tabs).

do you think this is worthwhile?

I think this is a good idea (i have considered it previously) but has to be designed carefully.

Two possible designs come to mind immediately:

An option such as (:report "…") for defrule
A new expression kind, say (fail "…"), which would be used as (or EXPRESSION (fail "…"))

The first option is less invasive and probably easier to implement, but I suspect the second one to be more useful.

For the second option, an important design decision is whether encountering a fail expression should make the entire parse fail. I lean towards yes since custom error reports would probably be used in cases that definitely are syntax errors. For example:

(defrule python-class-statement
  (and "class" (+ whitespace)
       (or identifier (fail "Class name must follow class keyword")) …)
  …)

What do you think?

ps: I notice lots of places (like the awesome-cl list used to until today) still link to https://github.com/nikodemus/esrap, even though this is the current quicklisp distribution; if you have nikodemus blessing, it would be nice to have him write something on the readme of his fork so that people are aware!

I relayed this suggestion to Nikodemus.

thanks for the reply!

okay, I've thought some more and I think I misnamed the issue; I was actually thinking of manually labelling rule for the benefits I described.

but your idea for a new expression kind is also interesting, and certainly much more flexible. plus it corresponds better to the title of my issue (sorry about being confusing!)

I was looking into megaparsec and it has constructs equivalent to both things (label for :report, and failure & company for the fail expression). my intuition is that the former is more lightweight and composes better, but the second is much more flexible, but might involve even more manual work.

in the example you gave, you would be using fail to specialize the error message, which by default might be something complicated. but if you use the identifier rule in lots of places, maybe you don't want to specialize errors messages in all of them, but also don't want the default error message (which might be not so user-friendly, or because you want to hide implementation details); that's when you'd use a :report option, which in this case could be Identifiers must be [a-z]+

okay, I've thought some more and I think I misnamed the issue; I was actually thinking of manually labelling rule for the benefits I described.

I see.

but your idea for a new expression kind is also interesting, and certainly much more flexible. plus it corresponds better to the title of my issue (sorry about being confusing!)

I was looking into megaparsec and it has constructs equivalent to both things (label for :report, and failure & company for the fail expression). my intuition is that the former is more lightweight and composes better, but the second is much more flexible, but might involve even more manual work.

In Esrap, there are (at least) the following dimensions which could be affected by these new constructs:

Does the parse proceed normally when encountering the construct or does it indicate a "fatal" error?
How is the source location given in the error report affected?
How are the context and problem description in the error report affected?
How is the "expected …" part of the error report affected.

Megaparsec's label seems to replace the automatically generated "expected …" part with its argument (4.). Is that also what you are suggesting for :report or should it affect other parts of the error report as well?

failure seems to be "fatal" (1.) and replace the whole error message (some or all of 2., 3. and 4.). I'm not sure what the most useful behavior w.r.t. 1. is.

in the example you gave, you would be using fail to specialize the error message, which by default might be something complicated. but if you use the identifier rule in lots of places, maybe you don't want to specialize errors messages in all of them, but also don't want the default error message (which might be not so user-friendly, or because you want to hide implementation details); that's when you'd use a :report option, which in this case could be Identifiers must be [a-z]+

I explored this some more and ended up with the following example:

;;;; Esrap example: custom error messages

(cl:require :esrap)

(cl:defpackage #:esrap-example.custom-errors
  (:use #:cl #:esrap))

(cl:in-package #:esrap-example.custom-errors)

(defrule identifier
    (+ (character-ranges (#\a #\z) #\_)))

(defrule identifier!
    (or identifier
        (esrap::fail "Invalid characters in identifier"
                     "Allowed characters are [a-z] and _")))

(defrule python-class
    (and "class" parser.common-rules:whitespace+
         (or identifier (esrap::fail "An identifier, the class name, must follow the `class' keyword."
                                     "An identifier"))
         (? (and parser.common-rules:whitespace*  #\( identifier! #\)))))

(parse 'python-class "class 1")
#|
At

  class 1
        ^ (Line 1, Column 6, Position 6)

In context PYTHON-CLASS:

While parsing PYTHON-CLASS. Problem:

  An identifier, the class name, must follow the `class' keyword.

Expected:

     "An identifier"
  or a character in [a-z] or [_]
|#

(parse 'python-class "class foo (1)")
#|
At

  class foo (1)
             ^ (Line 1, Column 11, Position 11)

In context IDENTIFIER!:

While parsing IDENTIFIER!. Problem:

  Invalid characters in identifier

Expected:

     "Allowed characters are [a-z] and _"
  or a character in [a-z] or [_]
|#

Is this close to what you want to achieve in terms of 2., 3. and 4. or do you want to replace the error report more completely (say getting rid of the "In context …" and "While parsing …" parts)?

This addresses your valid concern of extra work at each use of the identifier rule by cheating a bit and defining identifier! using fail. I do think a trick such as identifier! would be needed either way since the detailed error message for the class name only works properly if the identifier rule does not already provide a custom error message.

With a :report option, identifier! could be defined as

(defrule identifier!
    identifier
  (:report :complaint "Invalid characters in identifier"
           :expected  "Allowed characters are [a-z] and _"))

(defrule identifier!
    identifier
  (:report "Invalid characters in identifier"
           "Allowed characters are [a-z] and _"))

instead. Do you think having two ways of specifying custom error messages is worth it?

This only covers 3. and 4. in the list above since source locations of errors are not directly affected. I did not consider fatal failures and that issue might in fact be completely orthogonal.

Megaparsec's label seems to replace the automatically generated "expected …" part with its argument (4.). Is that also what you are suggesting for :report or should it affect other parts of the error report as well?

I'd be happy with customizing only this part of the error report! more information doesn't hurt at all, even if an end-user can't understand it well.

Is this close to what you want to achieve in terms of 2., 3. and 4. or do you want to replace the error report more completely (say getting rid of the "In context …" and "While parsing …" parts)?

so yeah, your example looks great to me!

I do think a trick such as identifier! would be needed either way since the detailed error message for the class name only works properly if the identifier rule does not already provide a custom error message.

I'm not sure I'm following exactly, but taking your example I thought of something like this

(defrule identifier
    (+ (character-ranges (#\a #\z) #\_))
  (:report "Invalid characters in identifier"
           "Allowed characters are [a-z] and _"))

(defrule python-class
    (and "class" parser.common-rules:whitespace+
         (or identifier (esrap::fail "An identifier, the class name, must follow the `class' keyword."
                                     "An identifier"))
         (? (and parser.common-rules:whitespace*  #\( identifier! #\)))))

with the behaviour being that in the case of the python-class rule there is no failure of the identifier rule, since we are in an or expression, so the error we would get in case of failure would be the one specified by failure; in a normal use of the identifier rule, the user would get the error specified by the :report option.

that would give the user a way of providing a custom error message (the :report option), and a way of specializing/overring it (the fail expression). so I guess that's my answer to your question

Do you think having two ways of specifying custom error messages is worth it?

even though it seems to me we could define one in terms of the other, both seem useful to me in terms of syntax; with only :report we would have to create specialized rules like (defrule identifier-in-class ...) to have a specialized error message. if we just have fail, it's just that having a report option seems nicer syntax to me, or else we'd be repeating (or something (fail "...")) all over, which has less clear intent.

it's okay if you feel like you need more time to think of a cleaner design :)

scymtym / esrap

custom error messages? #3