winnow-rs / winnow

Making parsing a breeze
https://docs.rs/winnow
Other
525 stars 40 forks source link

fold's param init func should be FnOnce instead of FnMut #513

Open epage opened 5 months ago

epage commented 5 months ago

Mirroring rust-bakery/nom#1742 for consideration

Take this for example

https://github.com/rust-bakery/nom/blob/90af84b2345c394e86f3d4e5b92466a73c2c4961/tests/arithmetic.rs#L60

You can't move the captured variable init out of the "init" func if it is a non-Copy type, even though init variable has been moved into the capturing "init" func. That's because FnMut means "init" can be called multiple times. This only works as a matter of fact that init variable is i64 in this case.

Since this func is called only once, FnOnce is more suitable here.

TennyZhuang commented 4 months ago

I want to parse something like Int[][], and the result should be DataType::Array(Box::new(DataType::Array(Box::new(DataType::Int)))).

My current implementation (compile failed):

fn data_type_stateful<S>(input: &mut StatefulStream<S>) -> PResult<DataType>
where
    S: TokenStream,
{
    let init = data_type_stateful_inner(input)?;
    repeat(0.., (Token::LBracket, Token::RBracket))
        .fold(
            move || init,
            |mut acc, _| {
                acc = DataType::Array(Box::new(acc));
                acc
            },
        )
        .parse_next(input)
}

The implementation can passed the compilation if init impl FnOnce, but I think another solution also works for me. How about make Init: Parser<Input, Result, Error>?

I can use it like this:

fn data_type_stateful<S>(input: &mut StatefulStream<S>) -> PResult<DataType>
where
    S: TokenStream,
{
    repeat(0.., (Token::LBracket, Token::RBracket))
        .fold_(
            data_type_stateful_inner,
            |mut acc, _| {
                acc = DataType::Array(Box::new(acc));
                acc
            },
        )
        .parse_next(input)
}

It's even more elegant and more reasonable to combinator style.

For the trivial init value, they still can pass empty.value(vec![]) as init, it's not much more complicated.

TennyZhuang commented 4 months ago

I'd like to contribute to that if any proposal is accepted.

epage commented 4 months ago

So if I understand, you are wanting to initialize the state of your fold from a previous parse result? I can see how it would be useful in your case but I'm unsure how well it generalizes. It would also violate one of our API guidelines which is that free functions contain grammar and fluent functions are for operating on the parsed result.

TennyZhuang commented 4 months ago

you are wanting to initialize the state of your fold from a previous parse result

Yes

I'm unsure how well it generalizes

How about add a new API? Although I can't find a good name for that except fold1 (BTW, similar API naming difficulties also exist in Haskell).

Current fold can be a simple wrapper to fold1 with init arg |_input| Ok(init()).

free functions contain grammar and fluent functions are for operating on the parsed result.

How to determine if a function is a free function or a fluent function? Why can't the init parameter be a fluent function? That also seems very reasonable.

TennyZhuang commented 4 months ago

https://github.com/TennyZhuang/winnow/commit/a6b1f04cbe9b39d218da5a121e610144ebf961b0

My implementation here, just for reference.

codeworm96 commented 4 months ago

My understanding of the problem is: assume we have 2 parsers A and B, and we want to combine them in the following grammar: seq(A, repeat(..., B)). For the repeat(..., B) part we have the fold to operate on the result. But there is no natural counterpart for seq(A, ...). We could only get the result tuple and work on it. @TennyZhuang's fold1 sounds like an abstraction for the seq(A, repeat(..., B)) pattern.

codeworm96 commented 4 months ago

By embedding repeat(..., B) inside the seq(A, repeat(..., B)) pattern, it implies we will not reuse the parser for repeat(..., B) on other location, i.e. the parser will only be used once. And now it is guaranteed init will only be called once, i.e. we can use FnOnce for init.