shnewto / bnf

Parse BNF grammar definitions
MIT License
256 stars 22 forks source link

Nullable productions (Still, Still) Fail to Match #117

Closed amascolo closed 1 year ago

amascolo commented 1 year ago

Related to #108 and #112 – failures reproduce on 0.4.0 ≀ bnf ≀ 0.4.3 so not a regression from #113 .

Hit more cases where nullable productions aren't handled properly.

This example looks contrived but was adapted from a larger grammar I'm writing.

As usual, thanks for all your work @CrockAgile !

Sadly, this correctness issue continues to be a total blocker preventing me from adopting bnf

Hoping my battle-testing hasn't been in vain and there's an easy fix around the corner πŸ™πŸ»

use bnf::Grammar;

#[test]
fn test() {
    let fails: &str = "
    <disjunction> ::= <predicate> | <disjunction> <or> <predicate>
    <predicate> ::= <char_null_one> | <special-string> '.'

    <char_null_one> ::= <char_null_two>
    <char_null_two> ::= <char_null_three>
    <char_null_three> ::= <char>

    <or> ::= <ws> 'or' <ws>
    <ws> ::= <whitespace> | ' ' <ws>
    <whitespace> ::= ' '

    <special-string> ::= <special-char> | <special-char> <special-string>
    <special-char> ::= <char> | <whitespace>
    <char> ::= 'a'
    ";

    let input = "a or a";

    let passes_1 = fails.replace(
        // skip nullable production <char_null_two>
        "<char_null_one> ::= <char_null_two>",
        "<char_null_one> ::= <char_null_three>",
    );
    assert!(passes_1.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    let passes_2 = fails.replace(
        // replace <whitespace> with its terminal ' '
        "<ws> ::= <whitespace> | ' ' <ws>",
        "<ws> ::= ' ' | ' ' <ws>",
    );
    assert!(passes_2.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    let passes_3 = fails.replace(
        // again, replace <whitespace> with its terminal ' '
        "<special-char> ::= <char> | <whitespace>",
        "<special-char> ::= <char> | ' '",
    );
    assert!(passes_3.parse::<Grammar>().unwrap().parse_input(input).next().is_some());

    assert!(
        fails.parse::<Grammar>().unwrap().parse_input(input).next().is_some(),
        "all grammars except last one parsed: {input}"
    );
}
CrockAgile commented 1 year ago

thanks for the great reproduction cases as usual! I am unavailable for the next week tho, so will be slower to get to fixing this

amascolo commented 1 year ago

Here's another example with nullable productions – which still fails in #119.

In this case, the input is ambiguous and should return two parse trees.

However, adding a nullable production prevents one of the trees from being parsed.

use bnf::Grammar;

#[test]
fn test() {
    let null = "<null> ::= <and>";
    let bnf = "
    <and> ::= <and> ' AND ' <terminal>
            | <and> ' ' <terminal>
            | <terminal>
    <terminal> ::= 'AND'
    ";
    let input = "AND AND AND";
    assert_eq!(bnf.parse::<Grammar>().unwrap().parse_input(input).count(), 2);
    assert_eq!([null, bnf].join("\n").parse::<Grammar>().unwrap().parse_input(input).count(), 2);
}
CrockAgile commented 1 year ago

Thanks for the great example! I am traveling right now, but will take a deep look soon

amascolo commented 1 year ago

Here's a related test case (same grammar but different input).

Should hopefully be the last one!

If I didn't miscount, it should produce 5 parse trees – on 0.4.3 it produces 1, on #119 it produces 2.

use bnf::Grammar;

#[test]
fn test() {
    let bnf = "
    <and> ::= <and> ' AND ' <terminal>
            | <and> ' ' <terminal>
            | <terminal>
    <terminal> ::= 'AND'
    ";
    let input = "AND AND AND AND AND";
    // 1. 'AND' <and> 'AND' <and> 'AND'
    // 2. 'AND' <and> 'AND' 'AND' 'AND'
    // 3. 'AND' 'AND' <and> 'AND' 'AND'
    // 4. 'AND' 'AND' 'AND' <and> 'AND'
    // 5. 'AND' 'AND' 'AND' 'AND' 'AND'
    assert_eq!(bnf.parse::<Grammar>().unwrap().parse_input(input).count(), 5);
}
CrockAgile commented 1 year ago

Just wanted to share that I have reproduced and narrowed in on the issue. Unfortunately, the problem requires some deep thoughts and reading Earley parser papers again πŸ˜…

amascolo commented 1 year ago

Take your time, really appreciate all the work you are putting into bnf.

Earley parsing is tricky to get 100% right, but will be worth it once we have a battle-tested Rust library πŸ’ͺ🏻