m4rw3r / chomp

A fast monadic-style parser combinator designed to work on stable Rust.
Apache License 2.0
243 stars 19 forks source link

Generalization of `parse!` and improvements #31

Closed m4rw3r closed 8 years ago

m4rw3r commented 8 years ago

Generalizes parts of the syntax (eg. enables ret and err to be used in let-statements) and allows for operators in the expressions for let-statements and actions.

New grammar

Block     ::= Statement* Expr
Statement ::= Bind ';'
            | Expr ';'
Bind      ::= 'let' Var '=' Expr
Var       ::= $pat
            | $ident ':' $ty

Expr      ::= ExprAlt
            | ExprAlt ">>" Expr
ExprAlt   ::= ExprSkip
            | ExprSkip "<|>" ExprAlt
ExprSkip  ::= Term
            | Term "<*" ExprSkip

/* Needs to be followed by , or ; because of trailing $expr on Ret, Err and Inline.
   Alternatively be wrapped in parentheses */
Term      ::= Ret
            | Err
            | Inline
            | Named
            | '(' Expr ')'

Ret       ::= "ret" Typed
            | "ret" $expr
Err       ::= "err" Typed
            | "err" $expr
Typed     ::= '@' $ty ',' $ty ':' $expr
Inline    ::= $ident "->" $expr
Named     ::= $ident '(' ($expr ',')* (',')* ')'

NOTE: This change is backwards incompatible as it does not allow for tailing semicolons in parse! macros.

Operators

These operators have the same operator precedence as Haskell but are right-associative (at the moment of writing). The right-associativity and operator precedence follows from the fact that rust-macros parse from left to right and just using plain tt-munching to chain them together will keep that order.

Examples

I am not sure if I want to make the operators left-associative, though there are cases both for and against:

parse!{i; decimal() <* token(b';') >> token(b' ')}
// will be interpreted as
decimal(i).skip(|i token(i, b';')).then(|i| token(i, b' '))
// which is the same as in haskell
// decimal <* token ';' >> token ' '

parse!{i; decimal() <* token(b';') <|> any()}
// will be interpreted as
or(i, |i| decimal(i).skip(|i| token(b';')), any)
// same in Haskell:
// decimal <* token ';' <|> any

Needs a few more tests before it is good to merge.

m4rw3r commented 8 years ago

Failing on compiletest-fail:

     Running `/home/travis/build/m4rw3r/chomp/target/debug/compile_fail_test-5eb103a74b266c25`
running 1 test
running 9 tests
test [compile-fail] compile-fail/ascii_signed_unsigned_type1.rs ... ok
test [compile-fail] compile-fail/ascii_signed_unsigned_type2.rs ... ok
test [compile-fail] compile-fail/ascii_signed_unsigned_type4.rs ... ok
test [compile-fail] compile-fail/ascii_signed_unsigned_type.rs ... ok
test [compile-fail] compile-fail/ascii_signed_unsigned_type3.rs ... ok
test [compile-fail] compile-fail/macros_tailing_bind3.rs ... FAILED
test [compile-fail] compile-fail/macros_tailing_bind1.rs ... FAILED
test [compile-fail] compile-fail/macros_tailing_bind2.rs ... FAILED
test [compile-fail] compile-fail/macros_tailing_bind4.rs ... FAILED

All of the failing tests above compile correctly when they should be failing because of a tailing let-statement.

m4rw3r commented 8 years ago

Operator precedence is now implemented by expanding Expr and Term to group parts of the expression around lower-precedence operators. Can maybe cause some issues with hitting the macro recursion limit (default is 64) but adding specific cases can mitigate this problem (though it is somewhat annoying for the operators since they are not a single token, ; was much easier).

Tests are still neded for the new features in Expr.

New Expr and Term

Expr      ::= ExprAlt
            | ExprAlt ">>" Expr
ExprAlt   ::= ExprSkip
            | ExprSkip "<|>" ExprAlt
ExprSkip  ::= Term
            | Term "<*" ExprSkip

Term      ::= Ret
            | Err
            | Inline
            | Named
            | '(' Expr ')'

Old

Expr      ::= Term
            | Named Operator Expr
            | '(' Expr ')'
            | '(' Expr ')' Operator Expr
Term      ::= Ret
            | Err
            | Inline
            | Named