lune-org / lune

A standalone Luau runtime
https://lune-org.github.io/docs
Mozilla Public License 2.0
375 stars 90 forks source link

RegexCaptures will only return 1 capture #233

Closed ActualMasterOogway closed 2 months ago

ActualMasterOogway commented 3 months ago

RegexCaptures can only return 1 capture

local Regex = require("@lune/regex")

local content = [[
New WindowsPlayer version-3243b6d003cf4642 at 7/9/2024 4:26:48 PM, file version: 0, 633, 1, 6330512, git hash: 0.633.1.6330512 ...
New Studio64 version-d662de9eda0c4cec at 7/9/2024 4:30:25 PM, file version: 0, 633, 1, 6330512, git hash: 0.633.1.6330512 ...
New Studio64 version-d662de9eda0c4cec at 7/9/2024 4:31:56 PM, file version: 0, 633, 1, 6330512, git hash: 0.633.1.6330512 ...
New WindowsPlayer version-0148e7f69b394963 at 7/16/2024 10:57:50 AM, file version: 0, 634, 0, 6340420, git hash: 0.634.0.6340420 ...
New Studio64 version-95b9b7bdfe164744 at 7/16/2024 11:04:39 AM, file version: 0, 634, 0, 6340420, git hash: 0.634.0.6340420 ...
New Studio64 version-95b9b7bdfe164744 at 7/16/2024 11:06:03 AM, file version: 0, 634, 0, 6340420, git hash: 0.634.0.6340420 ...
New WindowsPlayer version-2e10d35f26294ab6 at 7/23/2024 6:48:24 PM, file version: 0, 635, 0, 6350588, git hash: 0.635.0.6350588 ...
New Studio64 version-258fa44b42074cfc at 7/23/2024 6:51:40 PM, file version: 0, 635, 0, 6350588, git hash: 0.635.0.6350588 ...
New Studio64 version-258fa44b42074cfc at 7/23/2024 6:53:04 PM, file version: 0, 635, 0, 6350588, git hash: 0.635.0.6350588 ...
New WindowsPlayer version-2232b4b2bca342cb at 7/30/2024 11:02:29 AM, file version: 0, 636, 0, 6360624, git hash: 0.636.0.6360624 ...
New Studio64 version-5f3c97c9e091442a at 7/30/2024 11:04:19 AM, file version: 0, 636, 0, 6360624, git hash: 0.636.0.6360624 ...
New Studio64 version-5f3c97c9e091442a at 7/30/2024 11:05:40 AM, file version: 0, 636, 0, 6360624, git hash: 0.636.0.6360624 ...
]]

local Regginator = Regex.new("New Studio64 version-[a-f0-9]{16} at [^,]+, file version: [^,]+, [^,]+, [^,]+, [^,]+, git hash: ([0-9.]+)")

local captures = Regginator:captures(content)
print(#captures)
for i = #captures, 1, -1 do
    local match = captures:get(i)

    print("Git hash: " .. match.text)
    break
end

in this case #captures always returns 1 and the git hash would be 0.633.1.6330512, this shouldn't be the case, there are atleast 8 captures

PhantomShift commented 2 months ago

I believe this is actually a misunderstanding of the regex crate's capture API; Regex::captures(haystack) only gets the captures of the first match of the relevant regex in the given haystack. The thing that you want (and unfortunately isn't exposed at the moment) is captures_iter, which returns an iterator that yields valid matches until the end of the given haystack.

PhantomShift commented 2 months ago

Just as a proof of concept, this is an example of how captures_iter could be exposed based on what I saw from the captures.rs source file. Captures are only implemented as tables since the LuaCaptures userdata is rather tightly restricted by its constructor and I didn't want to mess with the other files too much, but it's pretty much all you need to encapsulate the exposed functionality aside from the format method.

Patch ```diff diff --git a/crates/lune-std-regex/src/regex.rs b/crates/lune-std-regex/src/regex.rs index 2ae26d9..83fe512 100644 --- a/crates/lune-std-regex/src/regex.rs +++ b/crates/lune-std-regex/src/regex.rs @@ -42,6 +42,10 @@ impl LuaUserData for LuaRegex { Ok(LuaCaptures::new(&this.inner, text)) }); + methods.add_method("capturesIter", |_, this, text: String| { + Ok(LuaRegexCapturesIter::new(&this.inner, text)) + }); + methods.add_method("split", |_, this, text: String| { Ok(this .inner @@ -74,3 +78,58 @@ impl LuaUserData for LuaRegex { fields.add_meta_field(LuaMetaMethod::Type, "Regex"); } } + +type CaptureMatches<'a> = regex::CaptureMatches<'a, 'a>; +self_cell::self_cell! { + struct LuaRegexCapturesIterInner { + owner: (Arc, Arc), + #[covariant] + dependent: CaptureMatches, + } +} + +struct LuaRegexCapturesIter { + inner: LuaRegexCapturesIterInner, +} + +impl LuaRegexCapturesIter { + fn new(pattern: &Regex, haystack: String) -> Self { + let inner = LuaRegexCapturesIterInner::new( + (Arc::from(pattern.to_owned()), Arc::from(haystack)), + |(pattern, haystack)| pattern.captures_iter(haystack.as_str()), + ); + + Self { inner } + } +} + +impl LuaUserData for LuaRegexCapturesIter { + fn add_methods<'lua, M: LuaUserDataMethods<'lua, Self>>(methods: &mut M) { + methods.add_meta_method_mut(LuaMetaMethod::Call, |lua, this, _any: LuaMultiValue| { + this.inner.with_dependent_mut(|(pattern, haystack), iter| { + let v = iter.next().and_then(|captures| { + lua.create_sequence_from( + captures + .iter() + .map_while(|m| m.map(|m| LuaMatch::new(haystack.clone(), m))), + ) + .ok() + .map(|t| { + for (s, m) in pattern.capture_names().flatten().filter_map(|s| { + captures + .name(s) + .map(|m| (s, LuaMatch::new(haystack.clone(), m))) + }) { + t.set(s, m).expect("not a locked table"); + } + t + }) + }); + match v { + Some(v) => v.into_lua_multi(lua), + None => LuaNil.into_lua_multi(lua), + } + }) + }); + } +} ```
Example script ```luau local Regex = require("@lune/regex") local CAPTURE = Regex.new([[(?P\w+) *\| *(?P\w+)]]) local subject = [[ left | side side | right 1 |2 wont | 3| 4 match | | these ]] local c = CAPTURE:capturesIter(subject) print(c) for capture in c do print(capture) end print(c()) print(c()) ``` ``` { [1] = , [2] = , [3] = , left = , right = , } { [1] = , [2] = , [3] = , left = , right = , } { [1] = , [2] = , [3] = , left = , right = , } { [1] = , [2] = , [3] = , left = , right = , } nil nil ```

This is just a proof of concept, ideally you would expose LuaCaptures instead of tables in case you guys decide to expose more of the captures methods, but my brain is a bit too small (and I'm too lazy) to figure it out.