Different character escaping in 'expected' and 'actual' in tests

waxeye-org / waxeye

Waxeye is a parser generator based on parsing expression grammars (PEGs). It supports C, Java, JavaScript, Python, Racket, and Ruby.

Other

235 stars 38 forks source link

Suppose I have this grammar:

test <- 'test'

and this test:

(test

  "test"
  (test t e s t)
)

When I run the test

waxeye.exe -t my-grammar.waxeye.tests my-grammar.waxeye

I get:

Error! @ "test"
input    = "test"
expected = (test t e s t)
actual   = (test t e s t)

This is confusing because 'expected' and 'actual' look exactly the same. But if I change my test to say:

(test

  "test"
  (test #\t #\e #\s #\t)
)

Then it passes!

Obviously, different literal escaping is used when rendering 'expected' and 'actual'. Can they be unified?

To be honest, I'm not a big fan of #\t #\e #\s #\t. Can it be rendered as 'test'?

Thanks for reporting this!

I think this is due to a lack of input validation for expected parse results. Back when ANTLR announced grammar testing, I quickly implemented it as a proof of concept. The use of Scheme/Racket s-expressions was just meant to be a temporary solution to avoid committing to an invented syntax. The reason that actual and expected are being printed the same in your example is related to Racket's default printing of symbol and character data types, and how that compares to Waxeye's custom s-expression-like printing of ASTs.

Both the grammar testing and the modular grammar features could do with a design review. The grammar tester could probably benefit from a revised syntax, and the ability to accept test data in JSON format. The modular grammar functionality can probably be integrated into the core grammar language.

I'll add input validation to expected parse results, and add a shorthand for consecutive characters in an AST's children to the current s-expression syntax. This will make (test t e s t) an invalid expected result, and allow (test "test") as a shorthand for (test \#t \#e \#s \#t). There's already a similar shorthand in the grammar language where test <- 'test' is expanded to test <- 't' 'e' 's' 't'.

waxeye-org / waxeye

Different character escaping in 'expected' and 'actual' in tests #123