Closed Jonas-Heinrich closed 6 months ago
Results on M1 Pro:
cargo criterion chars/grapheme/arabic time: [225.81 µs 227.21 µs 228.99 µs]
chars/grapheme/english time: [321.12 µs 324.93 µs 332.17 µs]
chars/grapheme/hindi time: [310.65 µs 313.42 µs 317.46 µs]
chars/grapheme/japanese time: [263.73 µs 264.32 µs 265.04 µs]
chars/grapheme/korean time: [374.51 µs 375.28 µs 376.23 µs]
chars/grapheme/mandarin time: [181.40 µs 181.92 µs 182.43 µs]
chars/grapheme/russian time: [223.38 µs 225.97 µs 230.94 µs]
chars/grapheme/source_code
time: [331.74 µs 339.59 µs 350.17 µs]
chars/scalar/arabic time: [34.403 µs 34.629 µs 34.872 µs]
chars/scalar/english time: [29.143 µs 29.238 µs 29.333 µs]
chars/scalar/hindi time: [32.569 µs 32.903 µs 33.253 µs]
chars/scalar/japanese time: [19.473 µs 19.578 µs 19.705 µs]
chars/scalar/korean time: [28.406 µs 28.835 µs 29.526 µs]
chars/scalar/mandarin time: [18.407 µs 18.524 µs 18.688 µs]
chars/scalar/russian time: [33.282 µs 33.840 µs 34.721 µs]
chars/scalar/source_code
time: [29.295 µs 29.410 µs 29.545 µs]
Gnuplot not found, using plotters backend
word_bounds/grapheme/arabic
time: [307.01 µs 307.80 µs 308.64 µs]
word_bounds/grapheme/english
time: [546.69 µs 548.37 µs 550.20 µs]
word_bounds/grapheme/hindi
time: [258.34 µs 259.83 µs 261.33 µs]
word_bounds/grapheme/japanese
time: [451.61 µs 452.79 µs 454.02 µs]
word_bounds/grapheme/korean
time: [186.72 µs 187.40 µs 188.27 µs]
word_bounds/grapheme/mandarin
time: [302.78 µs 303.41 µs 304.11 µs]
word_bounds/grapheme/russian
time: [213.85 µs 214.64 µs 215.40 µs]
word_bounds/grapheme/source_code
time: [645.49 µs 647.82 µs 650.39 µs]
Gnuplot not found, using plotters backend
words/grapheme/arabic time: [408.06 µs 409.05 µs 410.07 µs]
words/grapheme/english time: [565.94 µs 570.32 µs 576.88 µs]
words/grapheme/hindi time: [288.32 µs 289.24 µs 290.26 µs]
words/grapheme/japanese time: [769.22 µs 773.32 µs 781.58 µs]
words/grapheme/korean time: [239.53 µs 240.74 µs 241.96 µs]
words/grapheme/mandarin time: [637.44 µs 638.90 µs 640.41 µs]
words/grapheme/russian time: [238.54 µs 239.48 µs 240.84 µs]
words/grapheme/source_code
time: [672.63 µs 674.83 µs 677.05 µs]
words/scalar/arabic time: [75.142 µs 75.378 µs 75.636 µs]
words/scalar/english time: [91.580 µs 92.256 µs 93.210 µs]
words/scalar/hindi time: [46.629 µs 46.863 µs 47.107 µs]
words/scalar/japanese time: [64.907 µs 65.176 µs 65.509 µs]
words/scalar/korean time: [48.730 µs 49.012 µs 49.296 µs]
words/scalar/mandarin time: [35.407 µs 35.436 µs 35.469 µs]
words/scalar/russian time: [71.672 µs 71.774 µs 71.885 µs]
words/scalar/source_code
time: [100.26 µs 100.49 µs 100.73 µs]
This commit refactors and expands the microbenchmarks in order to evaluate the performance hit of handling full unicode. It is expected that
unicode-segmentation
's functions are slower since they consider graphemes, the question is just how much.