shonfeder / tokenize

A tokenizer written in (SWI-)Prolog. It has some useful features and some flexibility and it might improve.
The Unlicense
11 stars 5 forks source link

String-like things #9

Closed AnnieAtHasura closed 5 years ago

AnnieAtHasura commented 5 years ago

I propose adding a repeatable option string(Type, SepChar).

This would parse string-like-things, properly handling escapes using the SWI-Prolog \ escape conventions, which are pretty universal. Type would become the functor of the token emitted when a sequence SepChar ... sequence of other characters, possibly with escapes ... SepChar was encounted. The contents would be a SWI-Prolog string.

Typical usage:

      string(double_quote, 0'")

     Then the text    "hello \"Bob\", I see we all have fake ID today."   would become
     double_quote("hello \"Bob\", I see we all have fake ID today.")

This seems an adequate compromise between flexibility and complexity for handling the often awkward problem of quoted strings in language tokenizing. String recognition is a task that traditionally fits with tokenizing, rather than parsing.