rust-bakery / nom

Rust parser combinator framework
MIT License
9.38k stars 804 forks source link

Fn / FnMut #1765

Closed gl-yziquel closed 4 months ago

gl-yziquel commented 4 months ago

Hi.

Coming from a functional programming background, I tend to like immutability rather than mutability. I was surprised to see that nom uses FnMut instead of Fn, and discovered the following discussion as well as a few issues.

https://github.com/rust-bakery/nom/pull/1002

So, I understand that nom moved from Fn to FnMut, historically.

I have a use case which would be better off with Fn than FnMut. I essentially use the rpds crate for fully persistent datastructures for my parsing, and as the datastructures I am using do not implement the Copy trait, and as I do not want to clone either (which should nonetheless be ok-ish with rpds), I pass these datastructures through refs. Moreover, I want my parser to have no heap allocation. So, basically, I kick the can down the road, or, rather, the ownership down the stack.

The problem with FnMut then is the following. When I implement the nom Parser trait after battling with flat_map, I get a struct, which has refs to such that rpds datastructures. And as the process method requires a &mut self, I cannot move out of these rpds datastructures, because they are hidden behind a mutable reference.

Which is rather unfortunate.

So...

... would it be conceivable (in the long run) to duplicate the interface into Fn and not only FnMut ? Or is there anything deeper, technical or social, that I am missing ?

Xiretza commented 4 months ago

Can you provide a code example that illustrates the problem? I don't understand how restricting something to Fn rather than the more general FnMut would help.

gl-yziquel commented 4 months ago

As of now, I can't extract a snippet from the codebase without spending a huge amount of time in refactoring. Maybe I will manage to do so in the next week. Dunno. I'll try some brief explanation, though. (Pseudo-code likely will be bogus, as I do not yet have the feel for rust syntax. Apologies...)

Bottom line:

Suppose you want to implement a Parser for some use with flat_map. Say you've parsed some argument literals for a lambda calculus dialect, and you've extracted the data of a, b, c from the "let f a b c = (a b) c" string. The data of the a, b, c literals gets parsed to some args. You then construct a subparser this way that owns the Vec of such literals this way.

struct SubParser<'subparser_lifetime, 'parent_parsing_lifetime> {
    parent: &'subparser_lifetime ParentParser<'parent_parsing_lifetime>,
    args: Vec<'subparser_lifetime str>
}

impl <'subparser_lifetime, 'parent_parsing_lifetime> Parser<&'parent_parsing_lifetime> for SubParser<'subparser_lifetime, 'parent_parsing_lifetime> {
    type Output = Lambda;
    type Error = nom::error::Error<'parent_parsing_lifetime str>
    fn process<OM: OutputMode>(&mut self, i: &'parent_parsing_lifetime str)
        -> PResult<OM, &'parent_parsing_lifetime str, Lambda, nom::error::Error<&'parent_parsing_lifetime str>>
    {
        // Updating of bindings done here.
    }
}

I then try to combine that with flat_map like this (in 8.0.0-alpha2):

flat_map(arguments_extractor, |args: Vec<&str>| SubParser(self, args))

Now, the ParentParser contains a reference to some rpds::HashTrieMap<String, Lambda> data structure holding the bindings of literals to lambda terms. (I am not using a parse tree: I trie to use nom to construct the lambda terms as a cyclically implemented datastructure directly in a bumpalo arena, and I wish to avoid, ultimately, all heap allocation; otherwise, using rust compared to other languages does not seem worth the trouble.)

So, If I want to update these bindings, I need to implement a new rpds::HashTrieMap. As I want to put it on the stack, and not the heap, I create, in the process() method of the SubParser such an rpds::HashTrieMap.

However, 1. rpds::HashTrieMap does not implement the Copy trait, and, it seems, rightly so. And 2. to access the original rpds::HashTrieMap to contain my new one, I need to perform a move, and the FnMut here gets in the way of that move where Fn would likely not.

I understand this is a niche case: no heap allocation, fully persistent datastructures and no mutability, direct parsing from string to executable-ish lambda calculus. Nonetheless, it seems it is a case where FnMut gets in the way if one wishes not to use Box or anything else like that.

I'll try to extract some code snippet, but please don't hold your breath: I'm currently in ugly refactoring mode around this code.

Xiretza commented 4 months ago

Are you sure what you want is Fn and not FnOnce?

gl-yziquel commented 4 months ago

&dyn FnOnce has problems being stack allocated. It's special. I had a very long chat with ChatGPT about it. So, no, I guess.

Xiretza commented 4 months ago

&dyn FnOnce

You can't call an FnOnce through a reference, it's called by value. Not sure how you ended up at a trait object anyway, nom uses generics almost exclusively.

I had a very long chat with ChatGPT about it

That explains a lot.

gl-yziquel commented 4 months ago

FnOnce captures too much. I need to be able to reference my datastructure that are higher up in the stack. Can't remove them, or my lifetimes get incoherent. I need to keep these references to keep track of my lifetimes.

As to trait objects, the reason is simple: in 7.1.3, nom used FnMut instead of Parser in may signatures. As one may not easily provide explicit types for closures in rust, either 1. you try dynamic dispatch (which works with &dyn Fn or &dyn FnMut, but not &dyn FnOnce) or 2. you move to 8.0.0-alpha2.

gl-yziquel commented 4 months ago

I'll reopen once I have code samples handy.