temperlang / temper

3 stars 0 forks source link

Regex Unicode properties #9

Open tjpalmer opened 1 year ago

tjpalmer commented 1 year ago

It would be nice to support Unicode properties for advanced natural language text parsing, but we have some potential problems:

Some solutions:

ShawSumma commented 1 year ago

Funny enough I spent the weekend exploring regex engines.

I've managed to get all the suggested ASCII categories working. [[:digit:]ab] is a class that matches a base 12 digit for instance.

It can be found on the (currently private) temper-regex-engine branch named ext.

tjpalmer commented 1 year ago

Sounds good. I'm especially concerned with things I think people are likely to need. And some unicode categories fit into that if people want to deal with arbitrary human language.