nitely / nim-regex

Pure Nim regex engine. Guarantees linear time matching
https://nitely.github.io/nim-regex/
MIT License
228 stars 21 forks source link

[WIP] friendlier API: `find(re"(\w+) (\w+)")` should return a tuple (named tuple if named matches are used) #16

Closed timotheecour closed 6 years ago

timotheecour commented 6 years ago

The following would result in much easier to use API:

let str = " foo bar baz"
let m = str.find(re"(\w+) (\w+)")
doAssert m.ok
doAssert m.captureBound() = (1, "foo bar".len) # bounds
doAssert m.capture(str) = "foo bar"

# this is possible with string slices (I'm working on it), doesn't allocate, just uses slice references
# doAssert m.capture() == "foo bar"
# m.capture() is a slice into `str` 

# return tuples of bounds
doAssert m.capturesBound() = ((1, 3), (5,7))

# return tuples of corresponding strings
doAssert m.captures(str) = ("foo", "bar")
# with string slices:
# doAssert m.captures() = ("foo", "bar")

benefits

nitely commented 6 years ago

doAssert m.captureBound() = (1, "foo bar".len) # bounds

See RegexMatch.boundaries.

doAssert m.capturesBound() = ((1, 3), (5,7))

I guess I could add an API that returns the last match for every group (same as nre and re do it), that way you get a flat list.

this is possible with string slices (I'm working on it), doesn't allocate, just uses slice references

The problem is everyone makes their own string slice, and they are not compatible with each other. At least this way you can use toOpenArray and get an allocation-free string in today's Nim.

NOTE: the tuple type is computed at compile time depending on how many captures are found in input compile time regex

nim-regex is not compile time only. Plenty of people use the run-time compiling to compile user provided regexes.

The current way of passing a RegexMatch var kinda avoids allocations, since they can be reused with an RegexMatch pool. But the current regex based on NFA is not very good at avoiding allocating and not good overall, I'll convert the NFA into a DFA at some point which should avoid all allocations (except when capturing).

timotheecour commented 6 years ago

will answer later but FYI on https://github.com/nim-lang/Nim/issues/8518