kamadorueda / santiago

Santiago is a lexing and parsing toolkit for Rust
89 stars 7 forks source link

Silently ignores circular rules #3

Open safinaskar opened 2 years ago

safinaskar commented 2 years ago

Consider this code:

use santiago::lexer::LexerRules;
use santiago::grammar::Grammar;

pub fn lexer_rules() -> LexerRules {
    santiago::lexer_rules!(
        "DEFAULT" | "A" = string "a";
    )
}

pub fn grammar() -> Grammar<()> {
    santiago::grammar!(
        "X" => lexemes "A";
        "X" => rules "X";
    )
}

fn main() {
    let input = "a";
    let lexer_rules = lexer_rules();
    let lexemes = santiago::lexer::lex(&lexer_rules, input).unwrap();
    let grammar = grammar();
    let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
    println!("{:?}", parse_trees);
}

I use santiago 1.3.1 .

This grammar is infinitely ambiguous on this input, so the program should (theoretically) output this infinite output:

[Γ := rules "X"
  X := lexemes "A"
    A "a" (1, 1)
, Γ := rules "X"
  X := rules "X"
    X := lexemes "A"
      A "a" (1, 1)
, ...

But instead the program prints one parse tree only:

[Γ := rules "X"
  X := lexemes "A"
    A "a" (1, 1)
]

I think this is a bug.

Of course, in practice this is impossible to return infinite vector, so the library should somehow report error. But the library simply returns one parse tree instead. This probably means that there is some bug in the library. So I lose trust in this library

safinaskar commented 2 years ago

Even worse bug! I found another bug. I suspect both bugs caused by one root cause. Consider this code:

use santiago::lexer::LexerRules;
use santiago::grammar::Grammar;

pub fn lexer_rules() -> LexerRules {
    santiago::lexer_rules!(
        "DEFAULT" | "ID" = pattern r"[a-z]+";
        "DEFAULT" | "==>" = string "==>";
        "DEFAULT" | "::" = string "::";
    )
}

pub fn grammar() -> Grammar<()> {
    santiago::grammar!(
        "t0" => rules "t1";
        "t1000" => lexemes "ID";
        "t3" => rules "t4" "dotdot" "t0";
        "t1" => rules "t2" "arr" "t1";
        "t1" => rules "t2";
        "t2" => rules "t3";
        "t3" => rules "t4";
        "t4" => rules "t1000";
        "dotdot" => lexemes "::";
        "arr" => lexemes "==>";
    )
}

fn main() {
    let input = "x::y==>z";
    let lexer_rules = lexer_rules();
    let lexemes = santiago::lexer::lex(&lexer_rules, input).unwrap();
    let grammar = grammar();
    let parse_trees = santiago::parser::parse(&grammar, &lexemes).unwrap();
    println!("{:?}", parse_trees);
}

x::y==>z allows two parse trees: (x::y)==>z and x::(y==>z). But this program prints one only:

[Γ := rules "t0"
  t0 := rules "t1"
    t1 := rules "t2" "arr" "t1"
      t2 := rules "t3"
        t3 := rules "t4" "dotdot" "t0"
          t4 := rules "t1000"
            t1000 := lexemes "ID"
              ID "x" (1, 1)
          dotdot := lexemes "::"
            :: "::" (1, 2)
          t0 := rules "t1"
            t1 := rules "t2"
              t2 := rules "t3"
                t3 := rules "t4"
                  t4 := rules "t1000"
                    t1000 := lexemes "ID"
                      ID "y" (1, 4)
      arr := lexemes "==>"
        ==> "==>" (1, 5)
      t1 := rules "t2"
        t2 := rules "t3"
          t3 := rules "t4"
            t4 := rules "t1000"
              t1000 := lexemes "ID"
                ID "z" (1, 8)
]