igordejanovic / rustemo

LR/GLR parser generator for Rust https://igordejanovic.github.io/rustemo/
Apache License 2.0
32 stars 3 forks source link

build Status documentation status latest Version

Rustemo is a LR/GLR parser generator for Rust.


Status: Fairly complete feature set. Very good test/docs coverage. Not yet optimized for speed so don't expect blazing performance.

Feedback is welcome!

Be sure to check Rustemo book. There you can find a detailed description and a comprehensive tutorial.

All features are covered with integration tests. So these can serve as a very good source of information.

There are also a few examples.

User fogarecious has contributed a simple tutorial geared towards Rustemo beginners so, though unofficial, this is also a good material to read while learning Rustemo.

Aspirations

Small example

Let's start with the ambiguous grammar calc.rustemo:

E: E '+' E
 | E '*' E
 | Number
;

terminals
Number: /\d+/;
Add: '+';
Mul: '*';

This grammar cannot be accepted by LR(1) parser but is accepted by GLR. So let's create GLR parser for this grammar, and dot visualization of the parser automaton:

$ rcomp --dot --parser-algo glr calc.rustemo
Generating parser for grammar "calc.rustemo"
Writting dot file: "calc.dot"

LALR(1) automaton for this grammar has conflicts in states 5 and 6 but that's not a problem for GLR.

Let's now test our parser.

#![cfg(test)]
mod calc;
mod calc_actions;
use crate::calc::{CalcParser, DefaultBuilder};
use rustemo::Parser;

#[test]
fn test_glr() {
    let forest = CalcParser::new().parse("2 + 3 * 4 + 1").unwrap();

    // We have 5 possible solutions, see https://en.wikipedia.org/wiki/Catalan_number
    assert_eq!(forest.solutions(), 5);

    // Evaluate each tree from the forest
    let results = forest
        .into_iter()
        .map(|tree| {
            let mut builder = DefaultBuilder::new();
            tree.build(&mut builder)
        })
        .collect::<Vec<_>>();

    assert_eq!(results, vec![21, 15, 25, 15, 17]);
}

DefaultBuilder generated by rcomp use generated and manually tuned actions from calc_actions. For more details see full tutorial in the Rustemo book.

Now, let's make this grammar acceptable by LR parser. The easiest way to do it, while keeping the grammar readable is to use Rustemo declarative disambiguation to resolve shift-reduce conflicts thus making the parsing deterministic. For this we specify that both operations are left associative and that * operation has higher precedence:

E: E '+' E {left, 1}
 | E '*' E {left, 2}
 | Number
;

terminals
Number: /\d+/;
Add: '+';
Mul: '*';

It is now possible to generate parser using the default LR algorithm and the default lalr-pager tables (an improved version of LALR, there is also a standard lalr table support if needed, see rcomp --help):

$ rcomp calclr.rustemo
Generating parser for grammar "calclr.rustemo"

Let's test our LR grammar:

mod calclr;
mod calclr_actions;
use self::calclr::CalclrParser;
use rustemo::Parser;

#[test]
fn test_lr() {
    let result = CalclrParser::new().parse("2 + 3 * 4 + 1").unwrap();

    // As the parsing is deterministic now we have just 1 solution which is
    // automatically evaluated using the default builder and provided actions
    assert_eq!(result, 15);
}

This is just a glimpse of what is possible. For more information see the Rustemo book.

Roadmap (tentative)

v0.1.0

v0.2.0

Next releases until v1.0 (see CHANGELOG.md for the details)

v1.0

Post v1.0

License

Licensed under either of

Contribution

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Please see Contributing guide in the docs for the details.

Credits

Bootstrapping approach and the idea of macro for loading the generated code are based on the approach taken in the LALRPOP project.

Similar projects

The architecture and the general idea of Rustemo is loosely based on a similar project for Python, called parglare, I've started several years ago.

I have found a lot of inspiration and ideas in the following projects:

Why this name?

Rustemo is pronounced the same as Serbian word "растемо" which means "we grow". The name is a tribute to the awesome and ever growing Rust community.