swiftlang / swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.
Apache License 2.0
278 stars 47 forks source link

Convert the RegexBuilder overloads to use variadic generics #726

Open natecook1000 opened 8 months ago

natecook1000 commented 8 months ago

This change switches all the RegexBuilder overload-based "variadics" to use proper variadic generics. As far as our tests show, this is largely compatible with the existing API, though there is one exception, detailed below. That said, I'm a bit worried about changes in output type resolution for edge cases, particularly with capture groups nested into the various structures.


The incompatibility is for composed regexes with more than 10 capture groups, which is the current limit of our overloads in RegexBuilder. Right now, when a regex with a large number of capture groups is included in a builder-syntax closure, its captures are dropped from the output type of the resulting regex. Though this is definitely a bug in the current system, fixing it represents a source break that may or may not be visible to clients.

let regexWithTooManyCaptures = #/(a)(b)(c)(d)(e)(f)(g)(h)(i)(j)(k)(l)(m)/#
let dslWithTooManyCaptures = Regex {
  Capture(OneOrMore(.word))
  ":"
  regexWithTooManyCaptures
  ":"
  TryCapture<(Substring, Int)>(OneOrMore(.word)) { Int($0) }
  #/:(\d+):/#
}

The dslWithTooManyCaptures regex currently has an output type of (Substring, Substring, Int, Substring) because the captures in regexWithTooManyCaptures are left out of the resulting output type. With this change, the output type becomes (Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Substring, Int, Substring). If a client matches and only accesses output.3, the value of the captured value will silently change with this change in API. Trying to access a capture group with a different type (e.g, output.2 is currently Int, but would change to Substring) or using the output in a typed context (e.g passing the tuple as an initializer's parameters) would result in a compilation failure.

(Note that when converted to Regex<AnyRegexOutput>, the full set of captures is available in both the current and new versions of the API.)

natecook1000 commented 6 months ago

@swift-ci Please test

natecook1000 commented 6 months ago

@swift-ci Please test