zesterer / chumsky

Write expressive, high-performance parsers with ease.
https://crates.io/crates/chumsky
MIT License
3.63k stars 155 forks source link

Stack overflow caused by defining an unused parser #639

Closed AverageLinuxEnjoyer closed 5 months ago

AverageLinuxEnjoyer commented 5 months ago

So I have this monstrosity of a parser which you can pretty much ignore in its entirety, except for the clearly highlighted section between the comments. There I define a refined_ty (or _ in the first snippet) parser, which recursively calls this entire big parser (type_parser) and another really big parser (expr_parser, which also calls type_parser inside several times).

Anyways, this builds but the tests cause a stack overflow. The first thought is that I might have did something wrong on the grammar side and the grammar is accidentally left-recursive or something. But I actually forgot to use it. refined_ty is only defined, but not used anywhere. By further inspecting what tests cause the stack overflow it seems like any test can cause that. Every time I run cargo test a random test stack overflows.

Here is a shorter version, in case it's easier to look at. I'm using _ to throw the value away immediately and illustrate I'm not using it, but a runtime stack overflow still happens. Any ideas? (the complete version is under)

pub fn type_parser<'a>(
) -> impl Parser<'a, &'a [Token], Ty, extra::Err<Rich<'a, Token, SimpleSpan<usize>>>> + Clone {
    let ty = recursive(|ty| {
        // ...

        // ============================================================================
        // ============================================================================
        let _ = select! {
            Token::Ident(id) => id
        }
        .then_ignore(just(Token::Colon))
        .then(ty.clone())
        .then_ignore(just(Token::Comma))
        .then(expr_parser())
        .delimited_by(just(Token::LCurly), just(Token::RCurly))
        .map(|((id, ty), expr)| Ty::Refined(id, Box::new(ty), Box::new(expr)));
        // ============================================================================
        // ============================================================================

        // ...

        primary_ty.then(ty_suffix).map(|(p, s)| match s {
            Some(Ty::Sum(mut v)) => {
                v.insert(0, p);
                Ty::Sum(v)
            }
            Some(t) => Ty::Sum(vec![p, t]),
            None => p,
        })
    });

    ty
}

The long version:

pub fn type_parser<'a>(
) -> impl Parser<'a, &'a [Token], Ty, extra::Err<Rich<'a, Token, SimpleSpan<usize>>>> + Clone {
    let ty = recursive(|ty| {
        let ident = select! {
            Token::Ident(ident) => match ident.as_str() {
                "_" => Ty::Infer,
                _ => Ty::TyCtor(ident, vec![])
            }
        };

        let ty_list = ty
            .clone()
            .separated_by(just(Token::Comma))
            .allow_trailing()
            .collect::<Vec<_>>()
            .boxed();

        let with_ty_vars = ident
            .then(
                ty_list
                    .clone()
                    .delimited_by(just(Token::Ls), just(Token::Gr))
                    .or_not(),
            )
            .map(|(ty_ctor, ty_var)| match ty_ctor {
                Ty::TyCtor(ident, _) => Ty::TyCtor(ident, ty_var.unwrap_or(vec![])),
                _ => Ty::Infer,
            });

        let tuple_ty = ty_list
            .clone()
            .delimited_by(just(Token::LParen), just(Token::RParen))
            .map(|tuple| {
                let len = tuple.len();
                match len {
                    0 => Ty::Unit,
                    1 => tuple[0].clone(),
                    2.. => Ty::TypeTuple(tuple),
                }
            });

        let array_ty = ty
            .clone()
            .delimited_by(just(Token::LSquare), just(Token::RSquare))
            .map(|ty| Ty::Arr(Box::new(ty)));

        let fn_ty = just(Token::Fn)
            .ignore_then(ty_list.delimited_by(just(Token::LParen), just(Token::RParen)))
            .then_ignore(just(Token::MinusGr))
            .then(ty.clone())
            .map(|(params, ret)| Ty::Fn(params, Box::new(ret)));

        // ============================================================================
        // ============================================================================
        let refined_ty = select! {
            Token::Ident(id) => id
        }
        .then_ignore(just(Token::Colon))
        .then(ty.clone())
        .then_ignore(just(Token::Comma))
        .then(expr_parser())
        .delimited_by(just(Token::LCurly), just(Token::RCurly))
        .map(|((id, ty), expr)| Ty::Refined(id, Box::new(ty), Box::new(expr)));
        // ============================================================================
        // ============================================================================

        let primary_ty = with_ty_vars.or(tuple_ty).or(array_ty).or(fn_ty);

        let ty_suffix = just(Token::VerticalBar)
            .ignore_then(ty.clone())
            .repeated()
            .collect::<Vec<_>>()
            .map(|tys| {
                if tys.is_empty() {
                    None
                } else if tys.len() == 1 {
                    Some(tys.into_iter().next().unwrap())
                } else {
                    Some(Ty::Sum(tys))
                }
            });

        primary_ty.then(ty_suffix).map(|(p, s)| match s {
            Some(Ty::Sum(mut v)) => {
                v.insert(0, p);
                Ty::Sum(v)
            }
            Some(t) => Ty::Sum(vec![p, t]),
            None => p,
        })
    });

    ty
}
AverageLinuxEnjoyer commented 5 months ago

I'm sorry, actually I realized my mistake. The fact that type_parser calls expr_parser which then calls type_parser again creates an indirect recursion.

Is this a case in which I should use Recursive::declare()?

zesterer commented 5 months ago

Yes, the point of Recursive is to remove the need to create parsers recursively.