rust-bakery / nom

Rust parser combinator framework
MIT License
9.38k stars 806 forks source link

Help Wanted: how to consume until whichever comes first [tag1 | tag2]? #1646

Open mesuutt opened 1 year ago

mesuutt commented 1 year ago

Hello, I am trying to parse a string with nom.

I have a text like this(body and script lines are optional)

### header

multi
line
body

> script 1 

### header 2

body

### header 3

I want to parse body if body exists until script(starts with > char). If script line not exists It should consume body until next header line(starts with ###)

I tried this:

let (i, body) = alt((take_until("###"), take_until("> "), rest))(i)?

This use first parser and consume until ### header 2 and consume script line as body. It should stop parsing when encourated with > code 1 if exist.

I solved the problem like below but I think there is a more clean way to do it, but I could not found it yet :)

    let (_, until_script) = peek(alt((take_until("> "), rest)))(i)?;
    let (_, until_title) = peek(alt((take_until("###"), rest)))(i)?;

    let mut body = Span::new("");
    let mut j = Span::new("");

    // I am checking length of the peeked text and If start of script far away from next header 
    // i can understand I should parse until next header. 
    if until_script.fragment().len() > until_title.fragment().len() {
        (j, body) = alt((take_until("###"), rest))(i)?;
    } else {
        (j, body) = alt((take_until("> "), rest))(i)?;
    }

Is there a clear way of doing this? Thanks for advance

ShaddyDC commented 1 year ago

Potentially related are #1099 and #709. I made a simple implementation that suffices for my use case, but I'm new to nom, and there's certainly better ways to do it.

use nom::{
    bytes::complete::take_until, error::ParseError, FindSubstring, IResult, InputLength, InputTake,
    Parser,
};

// Note that this function may search the entire input to the end repeatedly.
// It also does some unnecessary clones.
// This seems to be fine for my purposes, but reader beware.
pub fn take_until_multiple<I, E>(matches: &[I]) -> impl FnMut(I) -> IResult<I, I, E> + '_
where
    I: Clone + InputTake + InputLength + FindSubstring<I> + HasLen,
    E: ParseError<I>,
{
    |input| {
        matches
            .iter()
            .map(|s| take_until::<I, I, E>(s.clone()).parse(input.clone()))
            .min_by_key(|v| v.as_ref().map(|(_, s)| s.len()).unwrap_or(usize::MAX))
            .expect("array should not be empty")
    }
}

pub trait HasLen {
    fn len(&self) -> usize;
}

impl HasLen for &str {
    fn len(&self) -> usize {
        str::len(&self)
    }
}

#[test]
fn factor_test() {
    use nom::{
        error::{Error, ErrorKind},
        Err,
    };

    fn take(input: &str) -> IResult<&str, &str> {
        take_until_multiple(&["M1", "M2"])(input)
    }

    assert_eq!(
        take("match M1 in the middle"),
        Ok(("M1 in the middle", "match "))
    );
    assert_eq!(
        take("match M2 in the middle"),
        Ok(("M2 in the middle", "match "))
    );
    assert_eq!(
        take("no matches"),
        Err(Err::Error(Error::new("no matches", ErrorKind::TakeUntil)))
    );
}