Closed natecook1000 closed 1 year ago
@swift-ci Please test
These are the benchmark results I see — I had to modify the AnchoredNotFound
benchmark to include a firstMatch
search, and it shows a big improvement. AFAICT the rest are noise?
=== Regressions ======================================================================
- CompilerMessages_All 99.1ms 97.4ms 1.66ms 1.7%
- EmojiRegex_All 64.1ms 63.1ms 932µs 1.5%
- EmojiRegex_All_Scalar 41.4ms 40.5ms 923µs 2.3%
- EmailRFC_All_Scalar 46.9ms 46.1ms 746µs 1.6%
- EmailLookaheadNoMatches_All 39ms 38.4ms 539µs 1.4%
- InvertedCCC_All_Scalar 22.2ms 21.7ms 426µs 2.0%
- ReluctantQuant_Whole 9.57ms 9.16ms 406µs 4.4%
- EmailLookahead_All 40.5ms 40.1ms 370µs 0.9%
- SubtractionCCC_All 20.2ms 19.8ms 347µs 1.7%
- IntersectionCCC_All 20.5ms 20.2ms 318µs 1.6%
- LiteralSearch_All_Scalar 4.51ms 4.34ms 172µs 4.0%
- Css_All_Scalar 3.01ms 2.87ms 138µs 4.8%
- AnchoredNotFound_All_Scalar 8.63ms 8.5ms 132µs 1.6%
- Numbers_All 6.12ms 6ms 125µs 2.1%
- ReluctantQuantWithTerminal_Whole_Scalar 5.45ms 5.37ms 82.2µs 1.5%
- ReluctantQuantWithTerminal_Whole 5.44ms 5.38ms 55.3µs 1.0%
- IPv4Address_Scalar 2.32ms 2.27ms 50.3µs 2.2%
=== Improvements =====================================================================
- AnchoredNotFound_First 9.13ms 13.3ms -4.16ms -31.3%
- AnchoredNotFound_First_Scalar 5.62ms 8.63ms -3.01ms -34.9%
- EmailRFCNoMatches_All_Scalar 126ms 129ms -2.11ms -1.6%
- DiceRollsInText_All_Scalar 41.8ms 42.4ms -645µs -1.5%
- GraphemeBreakNoCap_All_Scalar 3.54ms 3.89ms -346µs -8.9%
- BasicRangeCCC_All_Scalar 9.5ms 9.84ms -339µs -3.4%
- BasicCCC_All_Scalar 9.03ms 9.37ms -335µs -3.6%
- InvertedCCC_All 21.5ms 21.7ms -263µs -1.2%
- EmailLookaheadList_Scalar 5.02ms 5.24ms -226µs -4.3%
- LiteralSearchNotFound_All_Scalar 4.16ms 4.38ms -221µs -5.0%
- GraphemeBreakNoCap_All 4.25ms 4.45ms -208µs -4.7%
- EmailBuiltinCharacterClass_All_Scalar 10.5ms 10.7ms -199µs -1.9%
- BasicRangeCCC_All 9.48ms 9.67ms -195µs -2.0%
- CaseInsensitiveCCC_All 10.2ms 10.4ms -185µs -1.8%
- IPv6Address_Scalar 2.76ms 2.93ms -170µs -5.8%
- BasicCCC_All 8.99ms 9.15ms -160µs -1.8%
- LiteralSearchNotFound_All 5.1ms 5.26ms -156µs -3.0%
- IPv4Address 2.53ms 2.66ms -133µs -5.0%
- Numbers_All_Scalar 5.28ms 5.41ms -122µs -2.3%
- CaseInsensitiveCCC_All_Scalar 10.2ms 10.3ms -119µs -1.1%
- HangulSyllable_All_Scalar 5.09ms 5.21ms -119µs -2.3%
- LiteralSearch_All 5.27ms 5.39ms -115µs -2.1%
- Lines_All_Scalar 1.88ms 1.99ms -107µs -5.4%
- HangulSyllable_All 6.01ms 6.09ms -79.4µs -1.3%
- Lines_All 1.9ms 1.92ms -18.9µs -1.0%
When a regex is anchored to the start of a subject, there's no need to search throughout a string for the pattern when searching for the first match: a prefix match is sufficient.
This adds a regex compilation-time check about whether a match can only be found at the start of a subject, and then uses that to choose whether to defer to
prefixMatch
from withinfirstMatch
.rdar://111392228