sliekens / Txt

A text parsing framework for .NET.
MIT License
2 stars 4 forks source link

Add 'unmatch' feature #11

Open sliekens opened 8 years ago

sliekens commented 8 years ago

Add a way to unmatch characters, where 'unmatch' means to match any character that is not in a specified set.

This is useful for grammar rules that describe a blacklist.

ABNF (or any other BNF) does not support blacklists, so blacklists are usually described in the comments.

quoted-string = DQUOTE *(unicode-char) DQUOTE
              ; The double quote (0x22) character MUST NOT appear in a quoted-string

DQUOTE        =  %x22
              ; " (Double Quote)

unicode-char  =  %x20-FF
              ; Unicode Basic Latin + Latin-1 Supplement
sliekens commented 8 years ago

Note that it's sometimes (always?) possible to rewrite rules in a way that splits the whitelisted and blacklisted characters into two rules, so that the original rule becomes the complement of the two new rules.

     quoted-string = DQUOTE *(quoted-string-char) DQUOTE
                   ; The double quote (0x22) character MUST NOT appear in a quoted-string

            DQUOTE =  %x22
                   ; " (Double Quote)

quoted-string-char = %x20-21 / %x23-FF
                   ; Unicode Basic Latin except DQUOTE + Latin-1 Supplement

      unicode-char = quoted-string-char / DQUOTE
                   ; Unicode Basic Latin + Latin-1 Supplement