rust-bakery / nom

Rust parser combinator framework
MIT License
9.49k stars 806 forks source link

`not_line_ending1`? #1227

Open EndilWayfare opened 4 years ago

EndilWayfare commented 4 years ago

I've run into a problem with using nom::character::complete::not_line_ending to match the entire contents of a line up to the line-ending. It works well in isolation, but using nom::multi::many{1|0} to parse multiple lines fails unexpectedly. As far as I can tell, this is because not_line_ending can successfully match on empty input, prompting many{1|0} to error out rather than infinite looping. I was initially under the impression that not_line_ending would behave like not_line_ending1 (requiring at least one matching character) since there aren't {1|0} variants, but it actually behaves like not_line_ending0.

I thought that wrapping the parser in a terminated that checked for line-ending or EOF

I'm considering a take_till1 that matches on a hand-rolled is_line_ending predicate (since nom::character::is_newline doesn't work on file input from Windows), but the custom logic inside not_line_ending gave me pause. I assume there is some kind of performance/subtle-correctness reason that it's not implemented with take_till (or some suitable combinator) that I haven't logic'd through completely enough.

Prerequisites

Here are a few things you should provide to help me understand the issue:

Test case

Please provide a short, complete (with crate import, etc) test case for the issue, showing clearly the expected and obtained results.

Example test case:

use nom::error::{Error, ErrorKind, ParseError};
use nom::character::complete as nom_character;

pub fn eof<T, E: ParseError<T>>(input: T) -> IResult<T, (), E>
where
    T: InputIter + InputLength + Slice<RangeFrom<usize>>,
    <T as InputIter>::Item: AsChar,
{
    let mut it = input.iter_indices();
    match it.next() {
        Some(_) => Err(Err::Error(E::from_error_kind(input, ErrorKind::Eof))),
        None => Ok((input, ())),
    }
}

pub fn eof_or_line_ending<T, E: ParseError<T>>(input: T) -> IResult<T, (), E>
where
    T: Clone
        + InputIter
        + InputLength
        + Slice<Range<usize>>
        + Slice<RangeFrom<usize>>
        + Compare<&'static str>
        + Slice<RangeTo<usize>>,
    <T as InputIter>::Item: AsChar,
{
    nom::branch::alt((
        eof,
        nom::combinator::map(nom_character::line_ending, |_| ()),
    ))(input)
}

fn unknown(input: &str) -> IResult<&str, &str> {
    nom_character::not_line_ending(input)
}

#[derive(Clone, Debug, PartialEq)]
pub struct UnknownLine<'a>(pub &'a str);

fn unknown_line(input: &str) -> IResult<&str, UnknownLine> {
    nom::combinator::map(
        nom::sequence::terminated(unknown, eof_or_line_ending),
        UnknownLine,
    )(input)
}

pub fn parser(input: &str) -> IResult<&str, Vec<UnknownLine>> {
    nom::multi::many0(unknown_line)(input)
}

let input = "Foo Bar\r\nBaz\r\nWow Everything Is Awesome";

fn main() {
  let res = vec![
    UnknownLine("Foo Bar"),
    UnknownLine("Baz"),
    UnknownLine("Wow Everything Is Awesome"),
  ];
  assert_eq!(
    parser(input),
    Ok(("", res))
  ); // returns Err::Error(Error::from_error_kind("", ErrorKind::Many0))
}
EndilWayfare commented 4 years ago

Ok, decided to go a bit lower-level. Is this a dumb way to write a match1 combinator?

#[derive(Debug)]
#[derive(Error)]
#[error("Parser matched a zero-length sequence")]
pub struct EmptyMatchError;

pub fn match1<I, O, F, E>(mut parser: F) -> impl Parser<I, O, E>
where
    I: Clone + PartialEq,
    F: Parser<I, O, E>,
    E: FromExternalError<I, EmptyMatchError>
{
    move |input: I| {
        let i = input.clone();
        let (input, o) = parser.parse(input)?;
        if i != input {
          Ok((input, o))
        } else {
          Err(Err::Error(E::from_external_error(i, ErrorKind::MapRes, EmptyMatchError)))
        }
      }
}