refi64 / rejit

A work-in-progress JIT-powered regex engine
Mozilla Public License 2.0
110 stars 4 forks source link

Better Unicode support #13

Open refi64 opened 8 years ago

refi64 commented 8 years ago

It's kind of half-way there at the moment, but it also kind of isn't (try using . with a UTF-32 string; it only matches bytes). My idea:

refi64 commented 8 years ago

Actually, screw the implicit conversions. It'll just be a runtime error to call rejit_match with a Unicode regex. Reasoning: it would totally screw anything and everything related to positioning (e.g. groups, the actual return value, etc.).