zesterer / chumsky

Write expressive, high-performance parsers with ease.
https://crates.io/crates/chumsky
MIT License
3.57k stars 152 forks source link

Support collecting with custom allocators #509

Open safasofuoglu opened 1 year ago

safasofuoglu commented 1 year ago

I see with_state is good for string interning and per-item arenas. Can it be currently used to collect a json array into a bumpalo Vec? If not, what kind of modifications would be needed? Would this capability be a sensible addition to chumsky?

safasofuoglu commented 1 year ago

Initial attempt:

fn f32vec<'a, E: Error<'a, &'a [u8]> + 'a, T: FromLexical>(
) -> impl Parser<'a, &'a [u8], (), extra::Full<E, bumpalo::collections::Vec<'a, T>, ()>> {
    number::<{ lexical::format::STANDARD }, _, _, _>()
        .padded()
        .map_with_state(|number, _, vec: &mut bumpalo::collections::Vec<T>| vec.push(number))
        .separated_by(just(b","))
        .collect::<()>()
        .delimited_by(just(b'['), just(b']'))
}

let str = b"[2, 5]";
let bump = Bump::new();
let mut vec = bumpalo::collections::Vec::new_in(&bump);
let res = f32vec::<Rich<u8>, f32>().parse_with_state(str, &mut vec);
dbg!(res);
dbg!(vec);

While it's technically possible to invoke f32vec manually, it's unclear how to use it in combinators, especially when and how the Vec gets created. Too eager creation would defeat the purpose. Ideally, f32vec would take a reference to bump and return an owned bumpalo::collections::Vec.

Zij-IT commented 1 year ago

Here is the best I got. It's not ideal, as it would be nice to have foldr/foldl where the initial value is constructed with access to the state with something like: Fn(&mut E::State) -> A

use bumpalo::collections;
use bumpalo::Bump;
use chumsky::{error::Error, prelude::*};
use lexical::FromLexical;

fn f32vec<'a, 'b: 'a, E: Error<'a, &'a [u8]> + 'a, T: FromLexical>(
) -> impl Parser<'a, &'a [u8], collections::Vec<'a, T>, extra::Full<E, &'b bumpalo::Bump, ()>> {
    let number = number::<{ lexical::format::STANDARD }, _, _, _>().padded();

    empty()
        .map_with_state(|_, _, bump| bumpalo::vec![in *bump])
        .foldl(number.separated_by(just(b",")), |mut vec, i| {
            vec.push(i);
            vec
        })
        .delimited_by(just(b'['), just(b']'))
}

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn empty() {
        let str = b"[]";
        let bump = Bump::new();
        let res = f32vec::<Rich<u8>, f32>()
            .parse_with_state(str, &mut &bump)
            .into_result();

        assert_eq!(res, Ok(bumpalo::vec![in &bump;]));
    }

    #[test]
    fn one_elem() {
        let str = b"[2]";
        let bump = Bump::new();
        let res = f32vec::<Rich<u8>, f32>()
            .parse_with_state(str, &mut &bump)
            .into_result();

        assert_eq!(res, Ok(bumpalo::vec![in &bump; 2.0]));
    }

    #[test]
    fn two_elem() {
        let str = b"[2, 5]";
        let bump = Bump::new();
        let res = f32vec::<Rich<u8>, f32>()
            .parse_with_state(str, &mut &bump)
            .into_result();

        assert_eq!(res, Ok(bumpalo::vec![in &bump; 2.0, 5.0]));
    }

    #[test]
    fn five_elem() {
        let str = b"[1, 2, 3, 4, 5]";
        let bump = Bump::new();
        let res = f32vec::<Rich<u8>, f32>()
            .parse_with_state(str, &mut &bump)
            .into_result();

        assert_eq!(res, Ok(bumpalo::vec![in &bump; 1.0, 2.0, 3.0, 4.0, 5.0]));
    }
}

While it's technically possible to invoke f32vec manually, it's unclear how to use it in combinators, especially when and how the Vec gets created. Too eager creation would defeat the purpose. Ideally, f32vec would take a reference to bump and return an owned bumpalo::collections::Vec.

If you are referring to the creation of the Vec, you should know that just like std::collections::Vec, bumpalo::collections::Vec don't allocate until elements are pushed into them

zesterer commented 1 year ago

It's not ideal, as it would be nice to have foldr/foldl where the initial value is constructed with access to the state

And foldl/r_with_state isn't suitable?

Zij-IT commented 1 year ago

I'm using foldl but I have to lead the parser with an awkard empty().map_with_state(...) and drop the two first arguments @zesterer

Perhaps something like this:

    number::<{ lexical::format::STANDARD }, _, _, _>()
        .padded()
        .separated_by(just(b","))
        // Assume `bumpalo::Vec` impl `Collection`
        .collect_from_state(|bump| bumpalo::vec![in *bump])
        .delimited_by(just(b'['), just(b']'))
zesterer commented 1 year ago

Oh, I see. Yes, that's a bit annoying. I wonder whether it would be feasible to write some sort of Container impl for bumpalo's vector (extending the Container trait to have access to the parser state).

zesterer commented 1 year ago

Ah, you got there first :grinning:

safasofuoglu commented 1 year ago

@Zij-IT wonderful example!

I guess things can be made more ergonomic, but nice to know there's a way with the current facilities and it's not so bad.