[WIP] friendlier API: `find(re"(\w+) (\w+)")` should return a tuple (named tuple if named matches are used)

timotheecour commented 6 years ago

[ ] Closed while I work on this issue

The following would result in much easier to use API:

let str = " foo bar baz"
let m = str.find(re"(\w+) (\w+)")
doAssert m.ok
doAssert m.captureBound() = (1, "foo bar".len) # bounds
doAssert m.capture(str) = "foo bar"

# this is possible with string slices (I'm working on it), doesn't allocate, just uses slice references
# doAssert m.capture() == "foo bar"
# m.capture() is a slice into `str` 

# return tuples of bounds
doAssert m.capturesBound() = ((1, 3), (5,7))

# return tuples of corresponding strings
doAssert m.captures(str) = ("foo", "bar")
# with string slices:
# doAssert m.captures() = ("foo", "bar")

benefits

with named groups, the tuple would have named fields, allowing dot access to fields
the tuple type is computed at compile time depending on how many captures are found in input compile time regex. So we get compile time errors in case of runtime out of bound errors if user code tries to access an out of bound group

nitely commented 6 years ago

doAssert m.captureBound() = (1, "foo bar".len) # bounds

See RegexMatch.boundaries.

doAssert m.capturesBound() = ((1, 3), (5,7))

I guess I could add an API that returns the last match for every group (same as nre and re do it), that way you get a flat list.

this is possible with string slices (I'm working on it), doesn't allocate, just uses slice references

The problem is everyone makes their own string slice, and they are not compatible with each other. At least this way you can use toOpenArray and get an allocation-free string in today's Nim.

NOTE: the tuple type is computed at compile time depending on how many captures are found in input compile time regex

nim-regex is not compile time only. Plenty of people use the run-time compiling to compile user provided regexes.

The current way of passing a RegexMatch var kinda avoids allocations, since they can be reused with an RegexMatch pool. But the current regex based on NFA is not very good at avoiding allocating and not good overall, I'll convert the NFA into a DFA at some point which should avoid all allocations (except when capturing).

timotheecour commented 6 years ago

will answer later but FYI on https://github.com/nim-lang/Nim/issues/8518

nitely / nim-regex

[WIP] friendlier API: `find(re"(\w+) (\w+)")` should return a tuple (named tuple if named matches are used) #16

benefits