Open refi64 opened 8 years ago
Actually, screw the implicit conversions. It'll just be a runtime error to call rejit_match
with a Unicode regex. Reasoning: it would totally screw anything and everything related to positioning (e.g. groups, the actual return value, etc.).
It's kind of half-way there at the moment, but it also kind of isn't (try using
.
with a UTF-32 string; it only matches bytes). My idea:If
RJ_FUNICODE
is passed, then compile the code to work in UTF-32 runes, NOT bytes.rejit_match
can be amended to convert the input string into runes if it detects the regex had been compiled withRJ_FUNICODE
.A new function will be added,
rejit_unicode_match(regex, rune_str)
. If the regex had NOT been compiled withRJ_FUNICODE
, then this will implicitly convert the rune string to a byte string. Might be a bit risky and bordering too much magic, but...eh.