m4rw3r / chomp

A fast monadic-style parser combinator designed to work on stable Rust.
Apache License 2.0
243 stars 19 forks source link

Returning the entire slice matched by a chain of parsers #55

Closed paigeruten closed 7 years ago

paigeruten commented 7 years ago

Is there a clean way to use the parse! macro and return the entire slice that was matched? Currently, I do something like this:

// An identifier is an alphanumeric string that doesn't start with a digit.
fn identifier<I: U8Input>(i: I) -> SimpleResult<I, ()> {
    parse!{i;
        satisfy(is_alpha);
        take_while(is_alphanumeric);

        ret ()
    }
}

// An alias definition is two identifiers separated by an equals sign, e.g. "foo=bar".
fn alias<I: U8Input>(i: I) -> SimpleResult<I, (I::Buffer, I::Buffer)> {
    parse!{i;
        let (left, _)  = matched_by(identifier);
                        token(b'=');
        let (right, _) = matched_by(identifier);

        ret (left, right)
    }
}

It would be nicer if alias didn't have to use matched_by and could just say let left = identifier(). Does chomp provide a good way of doing this?

m4rw3r commented 7 years ago

There are two solutions to this: a) move mached_by into identifier or b) make a stateful closure in identifier.

The first method is more flexible and should result in almost the exact same code unless the backtracking operation of the Input is expensive (it is free on slices, and it is not supposed to be expensive in general):

fn identifier<I: U8Input>(i: I) -> SimpleResult<I, I::Buffer> {
    matched_by(i, parser!{
        satisfy(is_alpha);
        skip_while(is_alphanumeric)
    }).map(|(b, _)| b)
}

Note the change from parse! to parser!, parser!{...} is just |i| parse!{i; ...}, a shorthand for making local parsers. I also changed take_while to skip_while since take_while produces a result, this is not problematic in the least for slice inputs (or buffered slices like chomp::buffer) but some owned type could have an overhead when allocating the unused Buffer implementation. The ret is not needed in this case since we have map to just take the buffer.

The stateful closure is straightforward too, but not as clean (but could be more useful in certain situations since matched_by needs to backtrack):

fn identifier<I: U8Input>(i: I) -> SimpleResult<I, I::Buffer> {
    let mut first = true;

    take_while1(i, |c| if first { first = false;  is_alpha(c) } else { is_alphanumeric(c) })
}

EDIT: Fixed typo take_while -> take_while1

Hope this helps!

paigeruten commented 7 years ago

Thank you, that helps a lot! That's the solution I was looking for with matched_by, I just didn't know how to put the pieces together.

m4rw3r commented 7 years ago

Awesome! :)