kmarius / jsregexp

JavaScript regular expressions for Lua
MIT License
30 stars 3 forks source link

Implement `reg:exec(str)` and `reg:test(str)` #13

Closed kmarius closed 2 years ago

kmarius commented 2 years ago

@nathanrpage97 I would like your opinion on how we can handle unicode strings. As in JS, one can build match, findAll etc. by repeatedly calling reg:exec on the input and getting the next match each time. The starting index is stored inside the RegExp object. Converting the string to UTF16 on each call to exec would clearly be bad. My idea is to cache the converted string and save a reference to the input string (in the registry?) to check if we are still working on the same input.

nathanrpage97 commented 2 years ago

I think for now just implementing naively without caching would be best. Ignore the clear performance penalties for now and get the functionality working.

Once the functions are done, maybe a special jsstring userdata type (utf16 version of a string) that can be passed in as a valid argument (user passes string or this userdata). So for the higher level JS functions string.search/string.match it can store the reference to the UTF-16 converted string without there being an awkward cache in the registry.

function jsregexp.to_jsstring(string) -> userdata.jsstring
function Regexp.exec(text: string | userdata.jsstring)
function Regexp.test(text: string | userdata.jsstring)
function Regexp.search(text: string | userdata.jsstring)  -- for lua add to Regexp object

-- and so on for other functions
kmarius commented 2 years ago

I am going to merge this as is and handle UTF16 later. Feel free to have a go at the higher level functions.

nathanrpage97 commented 2 years ago

One idea I had is converting libregex from utf16 to utf8. Not sure how easy that would be, but could make the library drastically simpler

kmarius commented 2 years ago

Now that would be amazing, I don't even know where one would start. (Probably by inquiring with Fabrice Bellard if it is feasible)