pomsky-lang / pomsky

A new, portable, regular expression language
https://pomsky-lang.org
Apache License 2.0
1.28k stars 19 forks source link

Tests in Pomsky expressions #75

Closed mpdn closed 1 year ago

mpdn commented 1 year ago

Is your feature request related to a problem? Please describe.

Regexes can be difficult to get right. Usually you have a kind of modify-test loop where you work on the regex incrementally until you match your desired strings.

Describe the solution you'd like

It would be nice if it was possible to add inline tests in a Pomsky file, i.e. strings that you expect the resulting regex to match/not match, and maybe what capture group values you expect. For complex regexes, you might even want to test sub-expressions.

These tests should probably be executed when compiling an expression. Any failing tests would then fail compilation and a test report would be written to standard error, with the possibility of opting out via something like a --no-tests flag.

Describe alternatives you've considered

Additional context

It's not clear how inline tests should be treated when using Pomsky as a library. It might not make sense to execute the tests every time a Pomsky expression is loaded.

Aloso commented 1 year ago

Thank you four your feature request!

This is something I'd love to support. For Rust, PCRE and Ruby it would be pretty easy, since these regex implementations can just be statically linked. Supporting Java, JavaScript, Python and C# would be much harder though. One thing we could do is to assume that the user has the required runtime installed:

If the required runtime is not installed, Pomsky should show an error. I'll have to investigate more to come up with a solution.

Aloso commented 1 year ago

@mpdn I'd like to add inline tests, but would like input about the syntax. Here's an idea:

test 'PCRE' {
  'john.doe@mail.box' -> yes;
  'john.doe@mail.box' -> 1='john.doe' domain='mail.com'; # compare capturing groups
  'john.doe@mailbox'  -> no;
}

:(![s '@']+) '@' :domain(![s '@']+ '.' ![s '@'])

The test keyword will be reserved in the next version, so it can't be used as a variable name anymore. But everything else is still up in the air.

Aloso commented 1 year ago

Updated syntax:

test {
  match 'john.doe@mail.box';                                         # match entire string
  match 'john.doe@mail.box' as { 1: 'john.doe', domain:'mail.com' }; # compare capturing groups
  match 'john.doe@mail.box', 'jdoe@gmail.com!'
     in 'My addresses are john.doe@mail.box and jdoe@gmail.com!';    # compare substring matches
  reject 'john.doe@mailbox';
}

:(![s '@']+) '@' :domain(![s '@']+ '.' ![s '@'])
Click to see additions to formal grammar ```py let Statement = | LetDeclaration | Modifier | Test; let Test = 'test' '{' TestCase* '}'; let TestCase = | TestCaseMatch | TestCaseMatchAll | TestCaseReject; let TestCaseMatch = 'match' TestCaseSingleMatch ';'; let TestCaseMatchAll = 'match' (TestCaseSingleMatch (',' TestCaseSingleMatch)*)? 'in' String ';'; let TestCaseReject = 'reject' 'in'? String; let TestCaseSingleMatch = String ('as' '{' TestCaptures? '}')?; let TestCaptures = TestCapture (',' TestCapture)* ','?; let TestCapture = (Number | Ident) ':' String; ```
Aloso commented 1 year ago

The syntax is now implemented.

What's missing is a way to execute the tests. For this, tests should be returned in a structured format when requesting JSON output.

I'd also like a --test CLI flag to run the tests with PCRE (statically linked in the binary), and return a non-zero status code if any test fails.

fundef1 commented 1 year ago

Great and Thanks! I've been waiting for this.

Aloso commented 1 year ago

Implemented in 3aad8cc and 0bb628a.