petitparser / dart-petitparser

Dynamic parser combinators in Dart.
https://pub.dartlang.org/packages/petitparser
MIT License
457 stars 48 forks source link

Greedy parse problem #145

Closed nacro90 closed 1 year ago

nacro90 commented 1 year ago

Hi, thanks for awesome library.

I have a problem with the parsing. I couldn't be sure if the behavior is wanted or not.

The parser code supposed to parse bold text format, which means surrounding by * without preceding or succeeding whitespace.

Parser<Sequence3> bold() => seq3(
      char("*"),
      seq3(
        whitespace().not(),
        char("*").neg().plus(),
        whitespace().not(),
      ).flatten(),
      char("*"),
    );

For this parser code, I wrote two tests for preceding and succeeding. Preceding works as expected but succeeding parses the input as ['*', 'invalid ', '*'] with following whitespace included. How can I avoid the parser not to parse the whitespace? I tried whitespace().neg() instead ofnot()but then it parses''and it can not parse the correct bold text likeboldbecause of negation of whitespace parses and consumes the`.

Working:

test("should not parse with succeeded whitespace after opening", () {
  // given
  // when
  final result = grammar.build(start: grammar.bold).accept('* invalid*');
  // then
  expect(result, isFalse);
});

Not working:

test("should not parse with preceding whitespace before closing", () {
  // given
  // when
  final result =
      trace(grammar.build(start: grammar.bold)).accept('*invalid *');
  // then
  expect(result, isFalse);
});
petitparser: ^5.1.0
Flutter 3.7.6 • channel stable • https://github.com/flutter/flutter.git
Framework • revision 12cb4eb7a0 (6 days ago) • 2023-03-01 10:29:26 -0800
Engine • revision ada363ee93
Tools • Dart 2.19.3 • DevTools 2.20.1
renggli commented 1 year ago

The reason this doesn't work for the whitespace before the closing is that char("*").neg().plus() reads over the whitespace and to right before the asterisk. You can fix this like so:

final parser = seq3(
  char("*"),
  seq2(
    whitespace().not(),
    [seq2(whitespace(), char('*')), char('*')]
        .toChoiceParser()
        .neg()
        .plus(),
  ).flatten(),
  char("*"),
);

Easier and maybe closer to what you want is the following:

final parser = seq3(
  char("*"),
  char('*').neg().plus().flatten(),
  char("*"),
).where((sequence) => sequence.second.trim() == sequence.second);
nacro90 commented 1 year ago

Thank you so much :hearts: