zesterer / chumsky

Write expressive, high-performance parsers with ease.
https://crates.io/crates/chumsky
MIT License
3.65k stars 154 forks source link

`recursion` causing stack overflow and I can't work out why #667

Closed koshell closed 3 months ago

koshell commented 3 months ago

I'm probably doing something wrong, but I couldn't find something similar to what I'm trying to do in the docs.

Basically I've been trying to follow the EBNF spec for Lua. With it there are (at least) two places expr would recurse: expr binop expr or unop expr

All the examples of recursion I could find in the docs appear to only use the recursive parser once, I'm wondering if trying to use it for two branches like this is incorrect? Some of the examples in the docs Box the parser but I'm unclear as to why and doing that didn't appear to help (or I was Boxing the wrong thing...).

I can probably remove the need for recursion. This is just meant to be a lexer so it probably doesn't need to understand the difference between an expression, or a statement, etc. I had thought that making it aware of the context would allow it to reject some invalid syntax earlier (for example A + + B would always be invalid regardless of what A or B actually are) but I may have been overly ambitious.

// Tokens<'t> ≈ Vec<(&'t str, SimpleSpan)> //
fn expr<'t>() -> impl Parser<'t, &'t str, Tokens<'t>> + Clone {
    recursive(|expr| {
        choice((
            text::keyword("nil").map_with(|_, e| (Token::Nil.to_tokens(e.span()))),
            just("...").map_with(|_, e| (Token::Ellipsis.to_tokens(e.span()))),
            bool(),
            number(),
            string(),
            // function,
            // prefixexp,
            // tableconstructor,
            unop()
                .padded()
                .then(expr.clone().padded())
                .map(|(a, b)| a.chain(b)),
            expr.clone()
                .padded()
                .then(binop().padded())
                .then(expr.padded())
                .map(|((a, b), c)| Tokens::chain(a, b).chain(c)),
        ))
    })
}
koshell commented 3 months ago

After thinking about my problem as I wrote this out I realised the issue.

// Tokens<'t> ≈ Vec<(&'t str, SimpleSpan)> //
fn expr<'t>() -> impl Parser<'t, &'t str, Tokens<'t>> + Clone {
    recursive(|expr| {
        choice((
            text::keyword("nil").map_with(|_, e| (Token::Nil.to_tokens(e.span()))),
            just("...").map_with(|_, e| (Token::Ellipsis.to_tokens(e.span()))),
            bool(),
            number(),
            string(),
            // function,
            // prefixexp,
            // tableconstructor,
            unop()
                .padded()
                .then(expr.clone().padded())
                .map(|(a, b)| a.chain(b)),
            expr.clone() // <== The issue is here
                .padded()
                .then(binop().padded())
                .then(expr.padded())
                .map(|((a, b), c)| Tokens::chain(a, b).chain(c)),
        ))
    })
}

I visualise the code path as:

Instead I moved the binop expression:

fn expr<'t>() -> impl Tokeniser<'t> {
    recursive(|expr| {
        choice((
            text::keyword("nil").map_with(|_, e| (Token::Nil.to_tokens(e.span()))),
            just("...").map_with(|_, e| (Token::Ellipsis.to_tokens(e.span()))),
            bool(),
            number(),
            string(),
            // function,
            // prefixexp,
            // tableconstructor,
            unop()
                .padded()
                .then(expr.clone().padded())
                .map(|(a, b)| a.chain(b)),
        ))
        .padded()
        .then(
            binop()
                .padded()
                .then(expr.padded())
                .map(|(a, b)| a.chain(b))
                .or_not(),
        )
        .map(|(a, b)| match b {
            None => a,
            Some(b) => a.chain(b),
        })
    })
}

Which now works flawlessly. Lesson of the day, buy a rubber ducky.

zesterer commented 3 months ago

Great that you solved this :) I recommend looking into the new pratt combinator, which makes writing expression parsers with precedence trivial!