petitparser / dart-petitparser

Dynamic parser combinators in Dart.
https://pub.dartlang.org/packages/petitparser
MIT License
457 stars 48 forks source link

Range with padding #82

Closed venkatd closed 4 years ago

venkatd commented 4 years ago

Hi, I was curious if there is a good way to implement a range with padding. Currently trying to implement the iso8601 spec and a lot of numbers have a fixed length. See the official grammar below.

date-fullyear   = 4DIGIT
date-month      = 2DIGIT  ; 01-12
date-mday       = 2DIGIT  ; 01-28, 01-29, 01-30, 01-31 based on
                         ; month/year
time-hour       = 2DIGIT  ; 00-23
time-minute     = 2DIGIT  ; 00-59
time-second     = 2DIGIT  ; 00-58, 00-59, 00-60 based on leap second
                         ; rules
time-secfrac    = "." 1*DIGIT
time-numoffset  = ("+" / "-") time-hour ":" time-minute
time-offset     = "Z" / time-numoffset

partial-time    = time-hour ":" time-minute ":" time-second
                 [time-secfrac]
full-date       = date-fullyear "-" date-month "-" date-mday
full-time       = partial-time time-offset

date-time       = full-date "T" full-time

I would like a helper as follows

Parser<String> num2Digit(int n) => ?

Any idea on the best way to implement this? Thanks!

renggli commented 4 years ago

The validation of data is typically done in a second pass, but you can of course also do it while parsing:

Parser<String> num2Digit(int n) => digit().repeat(2).flatten().map((str) {
  final value = int.parse(str);
  if (value > n) throw InvalidArgument(str);
  return value;
}); 
venkatd commented 4 years ago

Thanks! The high level behavior I am currently after is:

List<DocumentToken> tokenizeBody(
  String message, {
  Iterable<User> users = const [],
}) =>
    documentTokenParser(users).matchesSkipping(message);

Where message is a custom markup internal to our app.

Each object is a subclass of the abstract DocumentToken class. I have UserToken, MentionToken, LinkToken, DateTimeToken, and so on. This works 99% of the time but sometimes our parser deals with invalid tokens with an invalid ISO8601 format and so on.

If something doesn't match, I'd prefer to fall back to a catch-all StringToken so in our UI it would just render the text as-is. If I were to throw InvalidArgument would the parser decide to fail the match and try the next pattern?

P.S. Thanks for the great library. I always found parsers intimidating and resorted to RegEx to avoid writing parsers. This library makes writing parsers a lot more accessible :)

renggli commented 4 years ago

You can achieve that with the continuation parser:

Parser<String> num2Digit(int n) => digit().repeat(2).flatten().callCC((continuation, context) {
  final result = continuation(context);
  if (result.isSuccess && int.parse(result.value) > n) {
    return context.failure('Larger than $n');
  } else {
    return result;
  }
});
renggli commented 4 years ago

An idea, to make this a bit more simple could be to add to Parser<T> the following helpers:

  1. Parser<R> flatMap<R>(Result<R> Function(T value) callback), or
  2. Parser<T> where(bool Function(T value) predicate, String errorMessage)

The reasons not to add those are that they are both specializations of the existing callCC. Also the need to provide an errorMessage is kind of unusual in terms of adopting an existing API terminology.

venkatd commented 4 years ago

@renggli worked for me, thanks! Wasn't aware of callCC. Agree on not adding those helpers for now. Better to err on the side of keeping the API surface area small.

If anyone is curious as they're learning pasting in my full implementation below:

Parser<String> _iso8601DateTime() {
  Parser<String> num2Digit({int min, int max}) =>
      digit().repeat(2).flatten().callCC(
        (continuation, context) {
          final result = continuation(context);
          if (result.isFailure) return result;

          final intValue = int.parse(result.value);

          if (max != null && intValue > max)
            return context.failure('Greater than $max');

          if (min != null && intValue < min)
            return context.failure('Less than $min');

          return result;
        },
      );

  final dateFullYear = digit().repeat(4).flatten();
  final dateMonth = num2Digit(min: 1, max: 12);
  final dateMday = num2Digit(min: 1, max: 31);

  final timeHour = num2Digit(max: 23);
  final timeMinute = num2Digit(max: 59);
  final timeSecond = num2Digit(max: 59);

  final timeSecFrac = (char('.') & digit().plus()).flatten();
  final timeOffset = char('Z');

  final partialTime = (timeHour &
      char(':') &
      timeMinute &
      char(':') &
      timeSecond &
      timeSecFrac.optional());

  final fullDate = dateFullYear & char('-') & dateMonth & char('-') & dateMday;
  final fullTime = partialTime & timeOffset;

  return (fullDate & char('T') & fullTime).flatten();
}

final iso8601DateTime = _iso8601DateTime();
import "package:test/test.dart";
import 'iso8601.dart';
import 'package:petitparser/petitparser.dart';

void main() {
  group('iso8601', () {
    test('valid time', () {
      final res = iso8601DateTime.parse('2020-10-08T21:57:50.118523Z');
      expect(res.value, '2020-10-08T21:57:50.118523Z');
    });
    test('time with month out of range', () {
      final res = iso8601DateTime.parse('2020-15-08T21:57:50.118523Z');
      expect(res.isSuccess, false);
    });
    test('skips over invalid times', () {
      final res = iso8601DateTime.matchesSkipping(
          'hi 2020-10-08T21:57:50.118523Z 2020-15-08T21:57:50.118523Z 2020-12-08T21:57:50.118523Z yay cool');
      expect(
          res, ['2020-10-08T21:57:50.118523Z', '2020-12-08T21:57:50.118523Z']);
    });
  });
}