Open tjpalmer opened 1 year ago
Funny enough I spent the weekend exploring regex engines.
I've managed to get all the suggested ASCII categories working. [[:digit:]ab]
is a class that matches a base 12 digit for instance.
It can be found on the (currently private) temper-regex-engine
branch named ext
.
Sounds good. I'm especially concerned with things I think people are likely to need. And some unicode categories fit into that if people want to deal with arbitrary human language.
It would be nice to support Unicode properties for advanced natural language text parsing, but we have some potential problems:
regex
isn't a bad option.Some solutions: