swiftlang / swift-experimental-string-processing

An early experimental general-purpose pattern matching engine for Swift.
Apache License 2.0
278 stars 47 forks source link

General ascii fast paths for character classes #644

Closed milseman closed 1 year ago

milseman commented 1 year ago

When the portion of the string being matched is ASCII, use fast ASCII character class membership tests.

=== Regressions ======================================================================
- NotFoundAll                             7.12ms    7.03ms  89µs        1.3%
- EagarQuantWithTerminalWhole             2.64ms    2.62ms  18µs        0.7%
=== Improvements =====================================================================
- DiceRollsInTextAll                      50.3ms    64.9ms  -14.6ms     -22.5%
- EmailBuiltinCharacterClassAll           15.7ms    24.8ms  -9.06ms     -36.5%
- WordsAll                                14.8ms    22.5ms  -7.67ms     -34.1%
- BasicBuiltinCharacterClassAll           9.26ms    15.2ms  -5.96ms     -39.2%
- CompilerMessagesAll                     117ms 123ms   -5.91ms     -4.8%
- NumbersAll                              8.07ms    11.9ms  -3.87ms     -32.4%
- DiceNotation                            5.42ms    7.02ms  -1.6ms      -22.8%
- GraphemeBreakNoCapAll                   5.49ms    7ms -1.51ms     -21.6%
- EmailRFCNoMatchesAll                    137ms 138ms   -1.16ms     -0.8%
- EmailRFCAll                             63ms  64ms    -999µs      -1.6%
- IntersectionCCC                         22.1ms    22.8ms  -671µs      -2.9%
- EmailLookaheadAll                       40.4ms    40.9ms  -487µs      -1.2%
- SubtractionCCC                          21.7ms    22.1ms  -403µs      -1.8%
- EmojiRegexAll                           73.4ms    73.8ms  -403µs      -0.5%
- InvertedCCC                             21.3ms    21.7ms  -351µs      -1.6%
- IPv4Address                             2.58ms    2.88ms  -304µs      -10.6%
- symDiffCCC                              49.4ms    49.7ms  -280µs      -0.6%
- AnchoredNotFoundWhole                   9.08ms    9.26ms  -182µs      -2.0%
- CssAll                                  3.84ms    4.02ms  -177µs      -4.4%
- CaseInsensitiveCCC                      11.9ms    12ms    -154µs      -1.3%
- BasicCCC                                10.7ms    10.8ms  -127µs      -1.2%
- HangulSyllableAll                       6.89ms    7.01ms  -121µs      -1.7%
- BasicRangeCCC                           11.1ms    11.3ms  -115µs      -1.0%
- EmailLookaheadNoMatchesAll              41.4ms    41.5ms  -109µs      -0.3%
- IPv6Address                             4.1ms 4.19ms  -82.7µs     -2.0%
- MACAddress                              3.05ms    3.11ms  -58.1µs     -1.9%
- LinesAll                                3.15ms    3.19ms  -40.1µs     -1.3%
- HangulSyllableFirst                     3.34ms    3.38ms  -38.7µs     -1.1%
milseman commented 1 year ago

Note: the times are post https://github.com/apple/swift-experimental-string-processing/pull/642

milseman commented 1 year ago

Switching coding convention/style to inline the quick-check and outline the slow path, along with recognizing the yes/no/maybe nature of quick checks, gives us further benefits. Added assertions (to check behavior parity) and added the first entry in the programmer's manual.

This change resulted in a robust improvement in EmailBuiltinCharacterClassAll (10% more shrinkage consistent across many runs), while other benchmarks were largely unaffected.

New overall results:

=== Regressions ======================================================================
- EmailRFCNoMatchesAll                    140ms 133ms   6.75ms      5.1%
- EmailRFCAll                             64.4ms    61.9ms  2.52ms      4.1%
- EmojiRegexAll                           73.7ms    71.2ms  2.47ms      3.5%
- symDiffCCC                              49.9ms    48.6ms  1.31ms      2.7%
- EmailLookaheadNoMatchesAll              41.7ms    40.4ms  1.3ms       3.2%
- EmailLookaheadAll                       40.7ms    39.5ms  1.16ms      2.9%
- InvertedCCC                             21.8ms    20.9ms  929µs       4.5%
- BasicRangeCCC                           11.4ms    11ms    373µs       3.4%
- ReluctantQuantWithTerminalWhole         9.42ms    9.09ms  333µs       3.7%
- BasicCCC                                11ms  10.6ms  330µs       3.1%
- EmailLookaheadList                      10ms  9.72ms  316µs       3.3%
- ReluctantQuantWhole                     14.2ms    13.8ms  305µs       2.2%
- CaseInsensitiveCCC                      12.1ms    11.9ms  223µs       1.9%
- AnchoredNotFoundWhole                   9.17ms    8.97ms  200µs       2.2%
- LiteralSearchAll                        6.78ms    6.58ms  199µs       3.0%
- IntersectionCCC                         22.3ms    22.1ms  195µs       0.9%
- SubtractionCCC                          21.8ms    21.6ms  189µs       0.9%
- NotFoundAll                             7.22ms    7.03ms  188µs       2.7%
- LiteralSearchNotFoundAll                6.56ms    6.4ms   156µs       2.4%
- IPv6Address                             4.14ms    3.98ms  151µs       3.8%
- HangulSyllableAll                       6.99ms    6.85ms  138µs       2.0%
- MACAddress                              3.07ms    2.96ms  106µs       3.6%
- EagarQuantWithTerminalWhole             2.67ms    2.56ms  104µs       4.1%
- LinesAll                                3.19ms    3.12ms  77.1µs      2.5%
- HangulSyllableFirst                     3.36ms    3.3ms   64µs        1.9%
=== Improvements =====================================================================
- DiceRollsInTextAll                      48.2ms    62.2ms  -14ms       -22.6%
- EmailBuiltinCharacterClassAll           13ms  24.6ms  -11.6ms     -47.0%
- WordsAll                                14.2ms    21.7ms  -7.47ms     -34.5%
- BasicBuiltinCharacterClassAll           8.73ms    14.6ms  -5.87ms     -40.2%
- NumbersAll                              7.67ms    11.5ms  -3.85ms     -33.4%
- DiceNotation                            5.21ms    6.73ms  -1.52ms     -22.6%
- GraphemeBreakNoCapAll                   5.35ms    6.75ms  -1.4ms      -20.7%
- CompilerMessagesAll                     117ms 119ms   -1.37ms     -1.2%
- IPv4Address                             2.52ms    2.76ms  -238µs      -8.6%
- CssAll                                  3.82ms    3.94ms  -120µs      -3.0%
milseman commented 1 year ago

Converting to draft as I still have a little more refactoring to do

milseman commented 1 year ago

@swift-ci please test

milseman commented 1 year ago

@swift-ci please test

milseman commented 1 year ago

I made an ASCII.swift file in the Unicode folder, and also put some of the quick checks on String directly so we could avoid @_effects(releasenone).

@swift-ci please test