zesterer / chumsky

Write expressive, high-performance parsers with ease.
https://crates.io/crates/chumsky
MIT License
3.54k stars 147 forks source link

Question: How to reduce symbol length? #672

Open eternal-flame-AD opened 4 hours ago

eternal-flame-AD commented 4 hours ago

Would appreciate some advice! I read the advice section in the docs and used boxing in each function and used choice for any >=3 or's. My compile time was okay (~10 seconds on an LTO'ed build, excl. dependencies) but I get giant symbol names which feels wasteful, for example this 29 byte function get a 30kb+ name, is the only option here stripping the binary?

Some naive grepping give me this histogram, I attached the full name of the longest symbol. symbol.txt image

Example of my code:

pub fn class<'tokens, 'src: 'tokens>() -> impl Parser<
    'tokens,
    ParserInput<'tokens, 'src>,
    Class<'src>,
    extra::Err<Rich<'tokens, Token<'src>, Span>>,
> + Clone {
    let tags = tags();

    let kw = just(Token::Keyword(Keyword::Class));

    let name = select! {
        Token::Ident(name) => name
    };

    let maybe_extends = just(Token::Keyword(Keyword::Extends))
        .ignore_then(path())
        .map(Some)
        .or(empty().map(|_| None));

    let body = just(Token::Ctrl('{'))
        .ignore_then(
            choice((
                var_decl()
                    .then_ignore(just(Token::Ctrl(';')))
                    .map(|vardecl| (Some(vardecl), None, None, None)),
                function().map(|func| (None, Some(func), None, None)),
                typedef().map(|typedef| (None, None, Some(typedef), None)),
                import().map(|import| (None, None, None, Some(import))),
            ))
            .repeated()
            .collect::<Vec<_>>(),
        )
        .then_ignore(just(Token::Ctrl('}')))
        .map(|body| {
            let mut members = Vec::new();
            let mut functions = Vec::new();
            let mut typedefs = Vec::new();
            let mut imports = Vec::new();
            for item in body {
                match item {
                    (Some(vardecl), None, None, None) => members.push(vardecl),
                    (None, Some(func), None, None) => functions.push(func),
                    (None, None, Some(typedef), None) => typedefs.push(typedef),
                    (None, None, None, Some(import)) => imports.push(import),
                    _ => unreachable!(),
                }
            }
            (members, functions, typedefs, imports)
        });

    tags.then_ignore(kw)
        .then(name)
        .then(maybe_extends)
        .then(body)
        .map(
            |(((tags, name), extends), (members, functions, typedefs, imports))| Class {
                tags,
                name,
                extends,
                typedefs,
                imports,
                members,
                functions,
            },
        )
}
zesterer commented 3 hours ago

You can use .boxed() to switch to dynamic dispatch, which avoids large types. See here. Note that this doesn't always come with a performance hit: LLVM is often able to devirtualise the parsers and still statically knit them together. Sometimes, performance can even improve!

eternal-flame-AD commented 3 hours ago

I only boxed expressions, statements and primitives like ifs, I will try boxing them all tomorrow and report back. Thanks