unicode-org / message-format-wg

Developing a standard for localizable message strings
Other
228 stars 33 forks source link

Create complete tests for syntax #843

Open catamorphism opened 1 month ago

catamorphism commented 1 month ago

This is from https://github.com/unicode-org/message-format-wg/wiki/Things-That-Need-Doing . I didn't see an existing issue for it.

Full test coverage is probably impossible given the size of the grammar. But there are a couple possibilities:

  1. Manually generate tests from the grammar, with a limited search depth (i.e. expand every production up to a certain depth and write a test that fits it)
  2. Use an exhaustive test generator tool like abnfgen, Eusthasius or Gramtest to generate tests.
  3. Use a fuzzer like abnffuzzer and extract interesting tests to include in the test suite.

As far as option 2, with a few modifications, I was able to run the grammar through abnfgen and Eusthasius. The problem is that both of them generate tests that resemble line noise, suggesting that fuzzing might be preferable.

Option 1 has the advantage that it's easy to write readable, minimal test cases. Of course the disadvantage is that it's tedious.

Any suggestions are welcome.

catamorphism commented 1 month ago

Doing much better than #844 would probably require some automation.

catamorphism commented 1 month ago

Note that completeness also requires exhaustively generating tests that should produce syntax errors, which I have less of an idea how to do systematically (other than through fuzzing.)

catamorphism commented 1 month ago

I wrote a simple random test generator, hard-wired for the MF2 grammar, and was able to find some bugs in the ICU4C parser with the generated test cases.