nitely commented 2 years ago

API spec:

func re2(s: string): Regex2
func re2(s: static string): static[Regex2]
func group(m: RegexMatch2; i: int): Slice[int]
func group(m: RegexMatch2; s: string): Slice[int]
func groupCount(m: RegexMatch2): int
func groupNames(m: RegexMatch2): seq[string]
func match(s: string; pattern: Regex2): bool
func match(s: string; pattern: Regex2; m: var RegexMatch2; start = 0): bool
[func,iterator] findAll(s: string; pattern: Regex; start = 0): seq[RegexMatch2]
func find(s: string; pattern: Regex2; m: var RegexMatch2; start = 0): bool
[func,iterator] capture(s: string; pattern: Regex): seq[string]
func contains(s: string; pattern: Regex2): bool
[func,iterator] split(s: string; sep: Regex2): seq[string]
[func,iterator] splitIncl(s: string; sep: Regex2): seq[string]
func startsWith(s: string; pattern: Regex2; start = 0): bool
func endsWith(s: string; pattern: Regex2): bool
func replace(s: string; pattern: Regex2; by: string; limit = 0): string
func replace(s: string; pattern: Regex2; by: proc (m: RegexMatch2; s: string): string; limit = 0): string 
func isInitialized(re: Regex2): bool
func escapeRe(s: string): string
macro match(text: string; regex: RegexLit; body: untyped): untyped

The Captures all group repetitions (not just the last one) feature is removed, we capture the last repetition. This is a breaking change, and it will break some of the APIs. The rest of APIs are deprecated or removed.

nitely commented 11 months ago

Changes to support both the old APIs and new APIs for a while:

Regex -> Regex2
RegexMatch -> RegexMatch2
re -> re2

nitely commented 11 months ago

122 is merged

nitely commented 11 months ago

I think I've not given the rational to remove the Captures all group repetitions (not just the last one) feature anywhere, so I'll do it here.

In order to capture all of the repetitions in re"(\w)+" a full parse tree of submatch (capture group) boundaries needs to be generated. The tree is usually small except when it's not. The main issue is the space complexity is O(N*M) where N is the text length, and M is the regex length. While this is not unbounded, it may be prohibitive, more so when matching untrusted text. Keeping only the last repetition submatch makes space complexity O(N*M) where N is the regex length and M the number of submatches (both usually known at compile time).

Why not provide both options? It's a lot of additional complexity.

What if I need all captures? You can do as in the rest of languages, match and then findAll.

nitely / nim-regex

New API #111

122 is merged