rust-random / rand

A Rust library for random number generation.
https://crates.io/crates/rand
Other
1.67k stars 432 forks source link

Migrate remaining benchmarks to Criterion #1490

Closed dhardy closed 2 months ago

dhardy commented 2 months ago

Summary

Translate everything still using the old test harness to Criterion.

Motivation

Remove dependency on the (mostly deprecated) standard test harness. Enable usage of Criterion command-line arguments (which are otherwise intercepted by the Rust test harness).

Closes #1039.

Details

Mostly these are straightforward translations or simplifications (Criterion can benchmark ~1ns functions well enough so we don't need 1000 repetitions inside our benchmark functions).

Some I have added to benchmark groups. Some I have used shorter warmup/measurement durations since the defaults are quite conservative.

Many of the new results match up very well with those from the old test framework. Others don't; in particular the generators benchmarks (especially byte benches) are notably slower, but vaguely the same relative performance.

Old generators benchmark results

test gen_bytes_chacha12      ... bench:     217,351.58 ns/iter (+/- 1,433.82) = 4711 MB/s
test gen_bytes_chacha20      ... bench:     357,241.45 ns/iter (+/- 3,830.55) = 2866 MB/s
test gen_bytes_chacha8       ... bench:     147,663.57 ns/iter (+/- 872.74) = 6934 MB/s
test gen_bytes_os            ... bench:   2,039,790.30 ns/iter (+/- 9,026.71) = 502 MB/s
test gen_bytes_pcg32         ... bench:     297,029.40 ns/iter (+/- 2,519.55) = 3447 MB/s
test gen_bytes_pcg64         ... bench:     168,586.07 ns/iter (+/- 575.08) = 6074 MB/s
test gen_bytes_pcg64dxsm     ... bench:     157,215.12 ns/iter (+/- 439.56) = 6513 MB/s
test gen_bytes_pcg64mcg      ... bench:     121,489.14 ns/iter (+/- 109.97) = 8428 MB/s
test gen_bytes_small         ... bench:      90,309.88 ns/iter (+/- 878.92) = 11338 MB/s
test gen_bytes_std           ... bench:     217,169.73 ns/iter (+/- 1,440.60) = 4715 MB/s
test gen_bytes_step          ... bench:      20,989.58 ns/iter (+/- 156.67) = 48787 MB/s
test gen_bytes_thread        ... bench:     226,905.33 ns/iter (+/- 1,608.46) = 4512 MB/s
test gen_u32_chacha12        ... bench:       1,103.66 ns/iter (+/- 8.22) = 3626 MB/s
test gen_u32_chacha20        ... bench:       1,778.53 ns/iter (+/- 13.89) = 2249 MB/s
test gen_u32_chacha8         ... bench:         840.26 ns/iter (+/- 9.14) = 4761 MB/s
test gen_u32_os              ... bench:     287,570.50 ns/iter (+/- 3,060.80) = 13 MB/s
test gen_u32_pcg32           ... bench:       1,037.70 ns/iter (+/- 4.48) = 3857 MB/s
test gen_u32_pcg64           ... bench:       1,326.92 ns/iter (+/- 8.72) = 3016 MB/s
test gen_u32_pcg64dxsm       ... bench:       1,349.86 ns/iter (+/- 6.69) = 2965 MB/s
test gen_u32_pcg64mcg        ... bench:         934.49 ns/iter (+/- 5.20) = 4282 MB/s
test gen_u32_small           ... bench:         752.43 ns/iter (+/- 6.05) = 5319 MB/s
test gen_u32_std             ... bench:       1,104.51 ns/iter (+/- 17.14) = 3623 MB/s
test gen_u32_step            ... bench:           0.41 ns/iter (+/- 0.00) = 4000000 MB/s
test gen_u32_thread          ... bench:       1,244.38 ns/iter (+/- 23.10) = 3215 MB/s
test gen_u64_chacha12        ... bench:       1,847.26 ns/iter (+/- 4.20) = 4331 MB/s
test gen_u64_chacha20        ... bench:       2,938.12 ns/iter (+/- 11.34) = 2722 MB/s
test gen_u64_chacha8         ... bench:       1,312.82 ns/iter (+/- 9.21) = 6097 MB/s
test gen_u64_os              ... bench:     287,363.80 ns/iter (+/- 1,950.13) = 27 MB/s
test gen_u64_pcg32           ... bench:       1,698.28 ns/iter (+/- 6.02) = 4711 MB/s
test gen_u64_pcg64           ... bench:       1,325.13 ns/iter (+/- 6.38) = 6037 MB/s
test gen_u64_pcg64dxsm       ... bench:       1,345.90 ns/iter (+/- 5.35) = 5947 MB/s
test gen_u64_pcg64mcg        ... bench:         931.76 ns/iter (+/- 1.62) = 8592 MB/s
test gen_u64_small           ... bench:         673.24 ns/iter (+/- 4.31) = 11887 MB/s
test gen_u64_std             ... bench:       1,850.87 ns/iter (+/- 3.75) = 4324 MB/s
test gen_u64_step            ... bench:           0.41 ns/iter (+/- 0.00) = 8000000 MB/s
test gen_u64_thread          ... bench:       2,042.33 ns/iter (+/- 12.88) = 3917 MB/s
test init_chacha             ... bench:          17.61 ns/iter (+/- 0.04)
test init_pcg32              ... bench:           4.15 ns/iter (+/- 0.03)
test init_pcg64              ... bench:           7.67 ns/iter (+/- 0.01)
test init_pcg64dxsm          ... bench:           7.46 ns/iter (+/- 0.03)
test init_pcg64mcg           ... bench:           4.03 ns/iter (+/- 0.07)
test reseeding_chacha20_16k  ... bench:   6,100,544.10 ns/iter (+/- 16,085.01) = 2750 MB/s
test reseeding_chacha20_1M   ... bench:   5,779,843.60 ns/iter (+/- 18,430.08) = 2902 MB/s
test reseeding_chacha20_256k ... bench:   5,797,746.90 ns/iter (+/- 11,876.55) = 2893 MB/s
test reseeding_chacha20_32k  ... bench:   5,940,838.90 ns/iter (+/- 15,901.71) = 2824 MB/s
test reseeding_chacha20_4k   ... bench:   7,049,439.30 ns/iter (+/- 18,340.65) = 2379 MB/s
test reseeding_chacha20_64k  ... bench:   5,859,552.60 ns/iter (+/- 6,867.71) = 2863 MB/s

New generators benchmark results

gen_bytes/step          time:   [96.118 ns 96.319 ns 96.550 ns]
                        thrpt:  [9.8775 GiB/s 9.9012 GiB/s 9.9219 GiB/s]
                 change:
                        time:   [-0.2388% -0.0153% +0.2193%] (p = 0.90 > 0.05)
                        thrpt:  [-0.2188% +0.0153% +0.2394%]
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  8 (8.00%) high mild
  8 (8.00%) high severe
gen_bytes/pcg32         time:   [356.01 ns 356.91 ns 357.88 ns]
                        thrpt:  [2.6648 GiB/s 2.6720 GiB/s 2.6788 GiB/s]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
gen_bytes/pcg64         time:   [251.88 ns 252.09 ns 252.37 ns]
                        thrpt:  [3.7789 GiB/s 3.7830 GiB/s 3.7863 GiB/s]
Found 8 outliers among 100 measurements (8.00%)
  6 (6.00%) high mild
  2 (2.00%) high severe
gen_bytes/pcg64mcg      time:   [219.68 ns 220.16 ns 220.70 ns]
                        thrpt:  [4.3211 GiB/s 4.3318 GiB/s 4.3413 GiB/s]
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild
gen_bytes/pcg64dxsm     time:   [253.80 ns 254.00 ns 254.21 ns]
                        thrpt:  [3.7515 GiB/s 3.7546 GiB/s 3.7575 GiB/s]
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
gen_bytes/chacha8       time:   [246.81 ns 246.99 ns 247.20 ns]
                        thrpt:  [3.8579 GiB/s 3.8611 GiB/s 3.8640 GiB/s]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  4 (4.00%) high severe
gen_bytes/chacha12      time:   [320.27 ns 320.54 ns 320.82 ns]
                        thrpt:  [2.9726 GiB/s 2.9752 GiB/s 2.9777 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
gen_bytes/chacha20      time:   [451.98 ns 452.27 ns 452.57 ns]
                        thrpt:  [2.1072 GiB/s 2.1087 GiB/s 2.1100 GiB/s]
gen_bytes/std           time:   [305.45 ns 306.22 ns 307.03 ns]
                        thrpt:  [3.1062 GiB/s 3.1144 GiB/s 3.1222 GiB/s]
gen_bytes/small         time:   [170.36 ns 170.61 ns 170.89 ns]
                        thrpt:  [5.5807 GiB/s 5.5898 GiB/s 5.5981 GiB/s]
gen_bytes/os            time:   [2.1675 µs 2.1714 µs 2.1757 µs]
                        thrpt:  [448.84 MiB/s 449.75 MiB/s 450.55 MiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
gen_bytes/thread        time:   [331.20 ns 331.67 ns 332.10 ns]
                        thrpt:  [2.8716 GiB/s 2.8754 GiB/s 2.8794 GiB/s]

gen_u32/step            time:   [208.71 ps 208.84 ps 208.97 ps]
                        thrpt:  [17.827 GiB/s 17.838 GiB/s 17.849 GiB/s]
                 change:
                        time:   [-1.8456% -1.7380% -1.5912%] (p = 0.00 < 0.05)
                        thrpt:  [+1.6170% +1.7688% +1.8803%]
                        Performance has improved.
Found 30 outliers among 1000 measurements (3.00%)
  19 (1.90%) high mild
  11 (1.10%) high severe
gen_u32/pcg32           time:   [1.0358 ns 1.0360 ns 1.0362 ns]
                        thrpt:  [3.5950 GiB/s 3.5959 GiB/s 3.5966 GiB/s]
                 change:
                        time:   [+4.0822% +4.1792% +4.2713%] (p = 0.00 < 0.05)
                        thrpt:  [-4.0963% -4.0115% -3.9220%]
                        Performance has regressed.
Found 93 outliers among 1000 measurements (9.30%)
  1 (0.10%) low severe
  19 (1.90%) high mild
  73 (7.30%) high severe
gen_u32/pcg64           time:   [1.2821 ns 1.2868 ns 1.2916 ns]
                        thrpt:  [2.8843 GiB/s 2.8949 GiB/s 2.9056 GiB/s]
                 change:
                        time:   [+1.1689% +1.5157% +1.8449%] (p = 0.00 < 0.05)
                        thrpt:  [-1.8115% -1.4931% -1.1554%]
                        Performance has regressed.
gen_u32/pcg64mcg        time:   [952.60 ps 953.13 ps 953.71 ps]
                        thrpt:  [3.9061 GiB/s 3.9085 GiB/s 3.9107 GiB/s]
Found 8 outliers among 1000 measurements (0.80%)
  1 (0.10%) high mild
  7 (0.70%) high severe
gen_u32/pcg64dxsm       time:   [1.4181 ns 1.4188 ns 1.4196 ns]
                        thrpt:  [2.6243 GiB/s 2.6257 GiB/s 2.6270 GiB/s]
Found 163 outliers among 1000 measurements (16.30%)
  86 (8.60%) high mild
  77 (7.70%) high severe
gen_u32/chacha8         time:   [965.13 ps 965.69 ps 966.35 ps]
                        thrpt:  [3.8550 GiB/s 3.8577 GiB/s 3.8599 GiB/s]
Found 193 outliers among 1000 measurements (19.30%)
  31 (3.10%) low severe
  121 (12.10%) low mild
  26 (2.60%) high mild
  15 (1.50%) high severe
gen_u32/chacha12        time:   [1.2628 ns 1.2637 ns 1.2646 ns]
                        thrpt:  [2.9457 GiB/s 2.9480 GiB/s 2.9501 GiB/s]
Found 7 outliers among 1000 measurements (0.70%)
  3 (0.30%) high mild
  4 (0.40%) high severe
gen_u32/chacha20        time:   [1.7690 ns 1.7701 ns 1.7711 ns]
                        thrpt:  [2.1033 GiB/s 2.1046 GiB/s 2.1059 GiB/s]
Found 12 outliers among 1000 measurements (1.20%)
  7 (0.70%) high mild
  5 (0.50%) high severe
gen_u32/std             time:   [1.2071 ns 1.2075 ns 1.2079 ns]
                        thrpt:  [3.0842 GiB/s 3.0852 GiB/s 3.0860 GiB/s]
Found 14 outliers among 1000 measurements (1.40%)
  7 (0.70%) high mild
  7 (0.70%) high severe
gen_u32/small           time:   [676.73 ps 678.39 ps 680.13 ps]
                        thrpt:  [5.4773 GiB/s 5.4914 GiB/s 5.5048 GiB/s]
gen_u32/os              time:   [293.12 ns 293.35 ns 293.58 ns]
                        thrpt:  [12.994 MiB/s 13.004 MiB/s 13.014 MiB/s]
Found 3 outliers among 1000 measurements (0.30%)
  3 (0.30%) high mild
gen_u32/thread          time:   [1.2436 ns 1.2449 ns 1.2462 ns]
                        thrpt:  [2.9893 GiB/s 2.9925 GiB/s 2.9955 GiB/s]
Found 87 outliers among 1000 measurements (8.70%)
  53 (5.30%) high mild
  34 (3.40%) high severe

gen_u64/step            time:   [211.35 ps 211.44 ps 211.55 ps]
                        thrpt:  [35.218 GiB/s 35.237 GiB/s 35.252 GiB/s]
Found 135 outliers among 1000 measurements (13.50%)
  20 (2.00%) low mild
  64 (6.40%) high mild
  51 (5.10%) high severe
gen_u64/pcg32           time:   [2.0725 ns 2.0731 ns 2.0740 ns]
                        thrpt:  [3.5924 GiB/s 3.5939 GiB/s 3.5951 GiB/s]
Found 146 outliers among 1000 measurements (14.60%)
  89 (8.90%) high mild
  57 (5.70%) high severe
gen_u64/pcg64           time:   [1.2850 ns 1.2887 ns 1.2924 ns]
                        thrpt:  [5.7649 GiB/s 5.7814 GiB/s 5.7980 GiB/s]
Found 1 outliers among 1000 measurements (0.10%)
  1 (0.10%) high mild
gen_u64/pcg64mcg        time:   [947.60 ps 947.97 ps 948.38 ps]
                        thrpt:  [7.8561 GiB/s 7.8595 GiB/s 7.8626 GiB/s]
Found 94 outliers among 1000 measurements (9.40%)
  77 (7.70%) high mild
  17 (1.70%) high severe
gen_u64/pcg64dxsm       time:   [1.2180 ns 1.2185 ns 1.2193 ns]
                        thrpt:  [6.1107 GiB/s 6.1143 GiB/s 6.1171 GiB/s]
Found 77 outliers among 1000 measurements (7.70%)
  32 (3.20%) high mild
  45 (4.50%) high severe
gen_u64/chacha8         time:   [1.4459 ns 1.4471 ns 1.4485 ns]
                        thrpt:  [5.1438 GiB/s 5.1486 GiB/s 5.1528 GiB/s]
Found 113 outliers among 1000 measurements (11.30%)
  1 (0.10%) low severe
  65 (6.50%) low mild
  29 (2.90%) high mild
  18 (1.80%) high severe
gen_u64/chacha12        time:   [1.9792 ns 1.9809 ns 1.9827 ns]
                        thrpt:  [3.7578 GiB/s 3.7612 GiB/s 3.7644 GiB/s]
Found 22 outliers among 1000 measurements (2.20%)
  12 (1.20%) high mild
  10 (1.00%) high severe
gen_u64/chacha20        time:   [3.0280 ns 3.0291 ns 3.0303 ns]
                        thrpt:  [2.4587 GiB/s 2.4596 GiB/s 2.4605 GiB/s]
Found 60 outliers among 1000 measurements (6.00%)
  27 (2.70%) low mild
  15 (1.50%) high mild
  18 (1.80%) high severe
gen_u64/std             time:   [1.9742 ns 1.9752 ns 1.9763 ns]
                        thrpt:  [3.7699 GiB/s 3.7722 GiB/s 3.7740 GiB/s]
Found 75 outliers among 1000 measurements (7.50%)
  6 (0.60%) low mild
  40 (4.00%) high mild
  29 (2.90%) high severe
gen_u64/small           time:   [654.74 ps 655.21 ps 655.69 ps]
                        thrpt:  [11.363 GiB/s 11.371 GiB/s 11.379 GiB/s]
Found 5 outliers among 1000 measurements (0.50%)
  4 (0.40%) high mild
  1 (0.10%) high severe
gen_u64/os              time:   [290.50 ns 290.72 ns 290.94 ns]
                        thrpt:  [26.223 MiB/s 26.243 MiB/s 26.263 MiB/s]
Found 22 outliers among 1000 measurements (2.20%)
  1 (0.10%) low mild
  12 (1.20%) high mild
  9 (0.90%) high severe
gen_u64/thread          time:   [2.0391 ns 2.0402 ns 2.0412 ns]
                        thrpt:  [3.6500 GiB/s 3.6519 GiB/s 3.6538 GiB/s]
Found 13 outliers among 1000 measurements (1.30%)
  8 (0.80%) high mild
  5 (0.50%) high severe

init_gen/pcg32          time:   [8.5311 ns 8.5418 ns 8.5523 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
init_gen/pcg64          time:   [16.844 ns 16.874 ns 16.904 ns]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) low mild
  2 (2.00%) high mild
  1 (1.00%) high severe
init_gen/pcg64mcg       time:   [7.5557 ns 7.5669 ns 7.5803 ns]
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
init_gen/pcg64dxsm      time:   [16.452 ns 16.472 ns 16.495 ns]
Found 9 outliers among 100 measurements (9.00%)
  9 (9.00%) high mild
init_gen/chacha8        time:   [26.466 ns 26.549 ns 26.639 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
init_gen/chacha12       time:   [26.301 ns 26.419 ns 26.547 ns]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
init_gen/chacha20       time:   [26.389 ns 26.471 ns 26.564 ns]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
init_gen/std            time:   [26.426 ns 26.467 ns 26.508 ns]

reseeding_bytes/chacha20_4k
                        time:   [440.00 µs 440.37 µs 440.81 µs]
                        thrpt:  [2.2154 GiB/s 2.2176 GiB/s 2.2195 GiB/s]
Found 8 outliers among 100 measurements (8.00%)
  7 (7.00%) high mild
  1 (1.00%) high severe
reseeding_bytes/chacha20_16k
                        time:   [381.24 µs 381.56 µs 381.93 µs]
                        thrpt:  [2.5569 GiB/s 2.5594 GiB/s 2.5616 GiB/s]
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) high mild
  4 (4.00%) high severe
reseeding_bytes/chacha20_32k
                        time:   [371.49 µs 371.60 µs 371.72 µs]
                        thrpt:  [2.6271 GiB/s 2.6280 GiB/s 2.6288 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
reseeding_bytes/chacha20_64k
                        time:   [367.69 µs 367.94 µs 368.21 µs]
                        thrpt:  [2.6522 GiB/s 2.6541 GiB/s 2.6559 GiB/s]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe
reseeding_bytes/chacha20_256k
                        time:   [364.63 µs 365.37 µs 366.10 µs]
                        thrpt:  [2.6675 GiB/s 2.6728 GiB/s 2.6783 GiB/s]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
reseeding_bytes/chacha20_1024k
                        time:   [365.37 µs 365.79 µs 366.25 µs]
                        thrpt:  [2.6664 GiB/s 2.6697 GiB/s 2.6728 GiB/s]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild