waxeye-org / waxeye

Waxeye is a parser generator based on parsing expression grammars (PEGs). It supports C, Java, JavaScript, Python, Racket, and Ruby.
https://waxeye-org.github.io/waxeye/index.html
Other
235 stars 38 forks source link

Different character escaping in 'expected' and 'actual' in tests #123

Open bzaar opened 1 year ago

bzaar commented 1 year ago

Suppose I have this grammar:

test <- 'test'

and this test:

(test

  "test"
  (test t e s t)
)

When I run the test

waxeye.exe -t my-grammar.waxeye.tests my-grammar.waxeye

I get:

Error! @ "test"
input    = "test"
expected = (test t e s t)
actual   = (test t e s t)

This is confusing because 'expected' and 'actual' look exactly the same. But if I change my test to say:

(test

  "test"
  (test #\t #\e #\s #\t)
)

Then it passes!

Obviously, different literal escaping is used when rendering 'expected' and 'actual'. Can they be unified?

To be honest, I'm not a big fan of #\t #\e #\s #\t. Can it be rendered as 'test'?

orlandohill commented 1 year ago

Thanks for reporting this!

I think this is due to a lack of input validation for expected parse results. Back when ANTLR announced grammar testing, I quickly implemented it as a proof of concept. The use of Scheme/Racket s-expressions was just meant to be a temporary solution to avoid committing to an invented syntax. The reason that actual and expected are being printed the same in your example is related to Racket's default printing of symbol and character data types, and how that compares to Waxeye's custom s-expression-like printing of ASTs.

Both the grammar testing and the modular grammar features could do with a design review. The grammar tester could probably benefit from a revised syntax, and the ability to accept test data in JSON format. The modular grammar functionality can probably be integrated into the core grammar language.

I'll add input validation to expected parse results, and add a shorthand for consecutive characters in an AST's children to the current s-expression syntax. This will make (test t e s t) an invalid expected result, and allow (test "test") as a shorthand for (test \#t \#e \#s \#t). There's already a similar shorthand in the grammar language where test <- 'test' is expanded to test <- 't' 'e' 's' 't'.