kamadorueda / santiago

Santiago is a lexing and parsing toolkit for Rust
91 stars 7 forks source link

Issue while trying to fine-tune token precedence using Associativity rules #5

Open springcomp opened 1 year ago

springcomp commented 1 year ago

The crate documentation shows a very simple grammar to tokenize and parse arithmetic addition of integers.

use santiago::lexer::LexerRules;
use santiago::grammar::{Grammar, Associativity};

pub fn lexer_rules() -> LexerRules {
    santiago::lexer_rules!(
        // One more sequential digits from 0 to 9 will be mapped to an "INT"
        "DEFAULT" | "INT" = pattern r"[0-9]+";
        // A literal "+" will be mapped to "PLUS"
        "DEFAULT" | "PLUS" = string "+";
        // Whitespace " " will be skipped
        "DEFAULT" | "WS" = pattern r"\s" => |lexer| lexer.skip();
    )
}
pub fn grammar() -> Grammar<()> {
    santiago::grammar!(
        "sum" => rules "sum" "plus" "sum";
        "sum" => lexemes "INT";

        "plus" => lexemes "PLUS";
    )
}

The example demonstrates that parsing this grammar yields two different abstract syntax trees, because there can be two different ways to interpret the binary addition with respect to one another in the following expression: 10 + 20 + 30.

However, If I change the grammar slightly, then:

This is a sample repro-case of an issue I’m facing on a more complex grammar:

    santiago::grammar!(
        "expr" => rules "sum";
        "expr" => rules "num";

        "sum" => rules "expr" "plus" "expr";

        "num" => lexemes "INT";
        "plus" => lexemes "PLUS";
    )

Is there anything I’m doing wrong?