unicode-rs / unicode-segmentation

Grapheme Cluster and Word boundaries according to UAX#29 rules
https://unicode-rs.github.io/unicode-segmentation
Other
571 stars 56 forks source link

Modify benchmarks to compare against stdlib functions #133

Closed Jonas-Heinrich closed 6 months ago

Jonas-Heinrich commented 6 months ago

This commit refactors and expands the microbenchmarks in order to evaluate the performance hit of handling full unicode. It is expected that unicode-segmentation's functions are slower since they consider graphemes, the question is just how much.

Jonas-Heinrich commented 6 months ago

Results on M1 Pro:

 cargo criterion                                                                                                                                                                                                                                                                                                                                                                        chars/grapheme/arabic   time:   [225.81 µs 227.21 µs 228.99 µs]                                  
chars/grapheme/english  time:   [321.12 µs 324.93 µs 332.17 µs]                                   
chars/grapheme/hindi    time:   [310.65 µs 313.42 µs 317.46 µs]                                 
chars/grapheme/japanese time:   [263.73 µs 264.32 µs 265.04 µs]                                    
chars/grapheme/korean   time:   [374.51 µs 375.28 µs 376.23 µs]                                  
chars/grapheme/mandarin time:   [181.40 µs 181.92 µs 182.43 µs]                                    
chars/grapheme/russian  time:   [223.38 µs 225.97 µs 230.94 µs]                                   
chars/grapheme/source_code                                                                            
                        time:   [331.74 µs 339.59 µs 350.17 µs]
chars/scalar/arabic     time:   [34.403 µs 34.629 µs 34.872 µs]                                 
chars/scalar/english    time:   [29.143 µs 29.238 µs 29.333 µs]                                  
chars/scalar/hindi      time:   [32.569 µs 32.903 µs 33.253 µs]                                
chars/scalar/japanese   time:   [19.473 µs 19.578 µs 19.705 µs]                                   
chars/scalar/korean     time:   [28.406 µs 28.835 µs 29.526 µs]                                 
chars/scalar/mandarin   time:   [18.407 µs 18.524 µs 18.688 µs]                                   
chars/scalar/russian    time:   [33.282 µs 33.840 µs 34.721 µs]                                  
chars/scalar/source_code                                                                             
                        time:   [29.295 µs 29.410 µs 29.545 µs]

Gnuplot not found, using plotters backend
word_bounds/grapheme/arabic                                                                            
                        time:   [307.01 µs 307.80 µs 308.64 µs]
word_bounds/grapheme/english                                                                            
                        time:   [546.69 µs 548.37 µs 550.20 µs]
word_bounds/grapheme/hindi                                                                            
                        time:   [258.34 µs 259.83 µs 261.33 µs]
word_bounds/grapheme/japanese                                                                            
                        time:   [451.61 µs 452.79 µs 454.02 µs]
word_bounds/grapheme/korean                                                                            
                        time:   [186.72 µs 187.40 µs 188.27 µs]
word_bounds/grapheme/mandarin                                                                            
                        time:   [302.78 µs 303.41 µs 304.11 µs]
word_bounds/grapheme/russian                                                                            
                        time:   [213.85 µs 214.64 µs 215.40 µs]
word_bounds/grapheme/source_code                                                                            
                        time:   [645.49 µs 647.82 µs 650.39 µs]

Gnuplot not found, using plotters backend
words/grapheme/arabic   time:   [408.06 µs 409.05 µs 410.07 µs]                                  
words/grapheme/english  time:   [565.94 µs 570.32 µs 576.88 µs]                                   
words/grapheme/hindi    time:   [288.32 µs 289.24 µs 290.26 µs]                                 
words/grapheme/japanese time:   [769.22 µs 773.32 µs 781.58 µs]                                    
words/grapheme/korean   time:   [239.53 µs 240.74 µs 241.96 µs]                                  
words/grapheme/mandarin time:   [637.44 µs 638.90 µs 640.41 µs]                                    
words/grapheme/russian  time:   [238.54 µs 239.48 µs 240.84 µs]                                   
words/grapheme/source_code                                                                            
                        time:   [672.63 µs 674.83 µs 677.05 µs]
words/scalar/arabic     time:   [75.142 µs 75.378 µs 75.636 µs]                                
words/scalar/english    time:   [91.580 µs 92.256 µs 93.210 µs]                                 
words/scalar/hindi      time:   [46.629 µs 46.863 µs 47.107 µs]                                
words/scalar/japanese   time:   [64.907 µs 65.176 µs 65.509 µs]                                  
words/scalar/korean     time:   [48.730 µs 49.012 µs 49.296 µs]                                 
words/scalar/mandarin   time:   [35.407 µs 35.436 µs 35.469 µs]                                   
words/scalar/russian    time:   [71.672 µs 71.774 µs 71.885 µs]                                 
words/scalar/source_code                                                                            
                        time:   [100.26 µs 100.49 µs 100.73 µs]