tweag / nickel

Better configuration for less
https://nickel-lang.org/
MIT License
2.24k stars 84 forks source link

`std.string.find_all` #1867

Closed fuzzypixelz closed 3 months ago

fuzzypixelz commented 3 months ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]

Currently there is no function in std.string to find all matches of a regular expression in a string. The existing function std.string.find only returns the first match.

Describe the solution you'd like A clear and concise description of what you want to happen.

I would like a new function std.string.find_all of the following type:

find_all : String -> String -> Array { matched : String, index : Number, groups : Array String }

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

A pure-Nickel implementation would look like:

{
  std.string.find_all = fun r s =>
    let m = std.string.find r s in
    if m.index == -1 then
      []
    else
      let start = m.index + std.string.length m.matched in
      let end = std.string.length s in
      let s' = std.string.substring start end s in
      [m] @ std.string.find_all r s'
}

This could be added to std as is or re-implemented as primitive op using regex::Regex::find_iter. In order to turn the Matches iterator into a Nickel array, we might need to consume the entire iterator at once.

The downside of the first option is that we miss on the efficiency of a primitive op. The downside of the second option is more implementation work and a lack of laziness.

Additional context Add any other context or screenshots about the feature request here.

N/A

yannham commented 3 months ago

I think find_all would make sense, yes. I would go for option 2: because Nickel arrays aren't lazy as in, say, Haskell, the first solution is actually not really lazy, since @ will force pretty much everything. Also, regex are really a core string processing tool, and it's reasonable to have those operation primops for performance reasons in configuration language, IMHO. Happy to review a contribution for this