Increase the #[inline] opportunities - 15-40% performance improvements

timClicks commented 3 years ago

This PR increases the number of (internal) functions that are given an opportunity to be inlined during grapheme detection. Benchmarking indicates that this change results in a significant performance improvement (see below).

This PR also replaces bencher with criterion to enable continued performance monitoring.

$ cargo criterion
   Compiling unicode-segmentation v1.7.1 (/home/tim/Open/unicode-segmentation)
    Finished bench [optimized] target(s) in 2.96s
Gnuplot not found, using plotters backend
Benchmarking graphemes_arabic: Collecting 100 samples in estimated 6.2276 s                                                                            graphemes_arabic         time:   [300.59 us 303.65 us 306.84 us]
                        change: [-28.120% -27.089% -26.073%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking graphemes_english: Collecting 100 samples in estimated 5.8875 s                                                                           graphemes_english        time:   [341.92 us 344.80 us 348.53 us]
                        change: [-43.442% -42.572% -41.739%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking graphemes_hindi: Collecting 100 samples in estimated 5.4019 s (                                                                           graphemes_hindi          time:   [340.59 us 343.50 us 346.90 us]
                        change: [-20.806% -19.540% -18.205%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking grapheme japanese: Collecting 100 samples in estimated 5.2540                                                                            graphemes_japanese       time:   [336.86 us 338.72 us 341.01 us]
                        change: [-23.941% -22.756% -21.497%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking graphemes_korean: Collecting 100 samples in estimated 6.8800 s                                                                            graphemes_korean         time:   [648.18 us 653.10 us 659.09 us]
                        change: [-18.605% -17.519% -16.332%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking graphemes_mandarin: Collecting 100 samples in estimated 6.0117                                                                            graphemes_mandarin       time:   [215.14 us 217.71 us 220.84 us]
                        change: [-29.560% -28.320% -27.072%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking graphemes_russian: Collecting 100 samples in estimated 6.3045 s                                                                           graphemes_russian        time:   [291.04 us 296.19 us 301.67 us]
                        change: [-28.801% -27.643% -26.482%] (p = 0.00 < 0.05)
                        Performance has improved.

Benchmarking graphemes_source_code: Collecting 100 samples in estimated 6.40                                                                           graphemes_source_code    time:   [378.38 us 383.62 us 389.18 us]
                        change: [-36.523% -35.751% -35.060%] (p = 0.00 < 0.05)
                        Performance has improved.

archseer commented 3 years ago

Could a new release be cut to ship these changes? It's a significant difference.

Manishearth commented 3 years ago

Published 1.8.0!

unicode-rs / unicode-segmentation

Increase the #[inline] opportunities - 15-40% performance improvements #100