rust-bakery / nom

Rust parser combinator framework
MIT License
9.17k stars 792 forks source link

Thread state through nom combinators #1419

Open joshrule opened 2 years ago

joshrule commented 2 years ago

Prerequisites

Here are a few things you should provide to help me understand the issue:

Test case

I'm updating a parser from nom 4 to nom 7. I'm parsing term rewriting systems, which involves parsing a lot of first-order terms. These first-order terms can contain function symbols and constants, collectively called operators, as well as variables. The identity of each operator and variable is maintained by assigning it a numeric id, and these ids are established while parsing. The parser maintains a map between strings and ids.

In nom 4, I created a parser struct and used combinations of the method! and call_m! macros to build up the id map. In nom 7, these macros no longer exist, nor do corresponding functions.

My first attempt was to translate something like:

method!(application<Parser<'a>, CompleteStr, Term>, mut self,
        alt!(call_m!(self.standard_application) |
             call_m!(self.binary_application))
);

into

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt((
        |x: &'a str| self.standard_application(x),
        |x: &'a str| self.binary_application(x),
    ))(s)
}

but the compiler helpfully reminds me that this requires simultaneous unique access to self in both closures.

Perhaps nom could provide the option to fold some sort of mutable state type through the combinators so that I could write:

fn application(&mut self, s: &'a str) -> IResult<&'a str, Term> {
    alt_with_state(self, (
        |x: &'a str, this /*: &mut Self */| this.standard_application(x),
        |x: &'a str, this| this.binary_application(x),
    ))(s)
}
joshrule commented 2 years ago

Additional discussion on the Rust Discourse.

Stargateur commented 2 years ago

I'm think you are doing it wrong, a parser should not require an external state like that. That doesn't make sense that if a parser fail it mutate the state. You would mutate the state in the top level parser in function of what your subparser return.

I'm opposite to this feature, FnMut already allow state and sharing state between parser in alt is bad design.

joshrule commented 2 years ago

Thanks for responding. I'm not sure exactly how to apply your advice. I'm trying to parse text directly into first-order terms whose symbols are identified with integers. Mutating state as I go along to build a map between symbol names and integer IDs seems like a natural approach that avoids multiple passes through the data. How else would you suggest I solve this problem?

Stargateur commented 2 years ago

Thanks for responding. I'm not sure exactly how to apply your advice. I'm trying to parse text directly into first-order terms whose symbols are identified with integers. Mutating state as I go along to build a map between symbol names and integer IDs seems like a natural approach that avoids multiple passes through the data. How else would you suggest I solve this problem?

Unfortunately, I didn't understand much, first-order terms doesn't mean much without context. If possible could you make me a little example of how you would use alt_with_state with your parser ? I believe I already explain a solution, your parser should return result and you should use this result to build your state but maybe with a code example it would be more clear for both of us. I try to look at the code of your crate but that was too hard to follow without a lot of time to read.

Geal commented 2 years ago

could the mutable state be carried through a Rc<RefCell<>> from one parser to the next?

joshrule commented 2 years ago

@Geal - That's basically the solution I've landed on for now. I just wanted to bring up the friction point here in case either:

  1. there's a known idiom or tutorial that I've missed somewhere

    The potential to mutate some sort of parser state seemed to a common use case in, e.g., nom 4, so at the very least, it would be nice to have some sort of discussion in the documentation of how to handle this situation or why it should be avoided.

  2. there's an alternate interface that nom could consider introducing to support mutable parser state.

Stargateur commented 2 years ago

why it should be avoided

Cause it's become very hard to trace side effect, nom changed parser type for FnMut allowing some mutable state, that generally should be avoided cause... side effect should always be avoided if possible. But here you introduce a share side effect over multiple parser. This mean that in some of your parser inside alt you mutate the state by mistake or whatever, it's become very hard to debug the code. Limiting the ability for the user to make mistake is also a part of any good library. I'm not strongly opposite to add such feature just a little but I think you should think twice before use it.

how to handle this situation

Some pseudo code:

enum App {
    Standard(StandardApp),
    Binary(BinaryApp),
}

fn application(state: &HashMap<Id, Foo>) -> impl Parser<&'a str, App> {
    |input: &'a str|
    alt((
        map(|x: &'a str| self.standard_app(state)(x), App::Standard),
        map(|x: &'a str| self.binary_app(state)(x), App::Binary),
    ))(input)
}

fn handle_state(input: &'a str) -> IResult<&'a str, HashMap<Id, Foo>> {
  let mut state = HashMap::new();
  // use fold or while let or whatever and update the state
  while let Ok((input, app)) = application(&state)(input) {
    match app {
       // do thing like state.entry(id).or_insert_with(app);
    }
  }
}

Probably missed thing but the general idea is here, this have pro & con:

przygienda commented 8 months ago

there is another very good use case for at least to carry read only data through the parser. Variants. Sometimes complex parsers have "well, accept this but not that for compatibility reasons, leads to bunch of alts driven by an if possibly". Completely stateless parsers are cute, like perfect FSMs. practically speaking, most realistic complex code does extended FSM with state or parsers that are nob'ed for behavior

przygienda commented 8 months ago

a possible solution is to actually implement the whole Input trait zoo for something that is (configuration, &[u8]) e.g. which leads quickly into a nice lifetime hell ;-) But in a sense this is the clean solution to parametrize the parser.

epage commented 8 months ago

For state

przygienda commented 8 months ago

unfortunately 0.0 crate and zero doc.

Having had a quick look I basically implemented the pori Stateful stuff myself now

On Mon, Oct 30, 2023 at 1:56 PM Ed Page @.***> wrote:

For state

— Reply to this email directly, view it on GitHub https://github.com/rust-bakery/nom/issues/1419#issuecomment-1785136827, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANDIC5BCCJGN5SSP7CTEJDYB6PWDAVCNFSM5FBKA6D2U5DIOJSWCZC7NNSXTN2JONZXKZKDN5WW2ZLOOQ5TCNZYGUYTGNRYGI3Q . You are receiving this because you commented.Message ID: @.***>