Compare BLAS and non-BLAS performance

Now that we've made BLAS support optional on several linfa crates, we should compare the performance of those crates with and without BLAS. Doing this requires those crates to have a complete set of benchmarks that represent realistic workloads. If BLAS turns out to have no performance improvements, we can even remove BLAS support, improving code quality.

Benchmark status for each crate that supports BLAS:

[x] linfa-clustering:
- [x] Benchmarks (pretty much complete, might want to add more workloads)
- [x] Compare BLAS and non-BLAS perf
[x] linfa-ica:
- [x] Benchmarks
- [x] Compare BLAS and non-BLAS perf
[ ] linfa-reduction:
- [ ] Benchmarks
- [ ] Compare BLAS and non-BLAS perf
[x] linfa-preprocessing:
- [x] Benchmarks
- [x] Compare BLAS and non-BLAS perf
[x] linfa-pls:
- [x] Benchmarks
- [x] Compare BLAS and non-BLAS perf
[x] linfa-linear:
- [x] Benchmarks
- [x] Compare BLAS and non-BLAS perf
[ ] linfa-elasticnet:
- [ ] Benchmarks
- [ ] Compare BLAS and non-BLAS perf

I'm not 100% sure about removing the optional BLAS support, because that's technically a breaking change. That type of change would be unacceptable on a crate like serde, even on a breaking release, but we might be able to get away with it.

Is there any way of getting Intel's MKL optimizations in pure Rust without using BLAS? My guess is those would be quite difficult to replicate.

Intel MKL libraries have their own ASM routines with architecture-specific optimizations, so it's not viable to completely reproduce in pure Rust. There's definitely space of improvement in the Rust library though.

Thank you, that's what I thought. Perhaps someday we can get Intel itself to provide a Rust + assembly version.

with guidance I am willing to do a blas vs non blas comparison for linfa-clustering.

with guidance I am willing to do a blas vs non blas comparison for linfa-clustering.

Since benchmarks already exist for linfa-clustering, run the benchmarks with and without the BLAS feature flags and post the results. Also post your commands and the specs of your machine.

@YuhanLiin Linfa Clustering results attached as zip.

Conditions: Each test was run with computer fully charged, plugged in, and without it being in use.

Commands: 1st run - cargo bench -p linfa-clustering -F intel-mkl-static 2md run - cargo bench -p linfa-clustering

System specs:

hardware:

Processor   12th Gen Intel(R) Core(TM) i5-1235U   1.30 GHz
Installed RAM   8.00 GB (7.72 GB usable)
Device ID   5F0BF67B-026D-4E93-8F42-D89620D51451
Product ID  00342-20909-03914-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch   Pen and touch support with 10 touch points

OS:

Edition Windows 11 Home
Version 22H2
Installed on    ‎9/‎29/‎2022
OS build    22621.674
Experience  Windows Feature Experience Pack 1000.22634.1000.0

criterion.zip

Update

Linfa_preprocessing results below. These benchmarks use iai instead of criterion thus it does not automatically let you know if the difference is statistically different. The last set of results here has the percentage delta between the two commands. Additionally, this Linfa_preprocessing benchmarks were run under WSL due to an OS level dependency of iai. Mentioning because I am not sure if it matters.

cargo bench -p linfa-preprocessing -F intel-mkl-static -q

iai_standard_scaler_bench
  Instructions:           247707707 (No change)
  L1 Accesses:            313682891 (No change)
  L2 Accesses:              6242917 (No change)
  RAM Accesses:             1076129 (No change)
  Estimated Cycles:       382561991 (No change)

iai_min_max_scaler_bench
  Instructions:           183232614 (No change)
  L1 Accesses:            216728794 (No change)
  L2 Accesses:              7319612 (No change)
  RAM Accesses:             1124133 (No change)
  Estimated Cycles:       292671509 (No change)

iai_max_abs_scaler_bench
  Instructions:           184320331 (No change)
  L1 Accesses:            194024307 (No change)
  L2 Accesses:              9669054 (No change)
  RAM Accesses:             1024959 (No change)
  Estimated Cycles:       278243142 (No change)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

iai_fit_vectorizer
  Instructions:          4293319010
  L1 Accesses:           6147254630
  L2 Accesses:             31377942
  RAM Accesses:             1427414
  Estimated Cycles:      6354103830

iai_fit_transform_vectorizer
  Instructions:          5961903539
  L1 Accesses:           8488942385
  L2 Accesses:             33558959
  RAM Accesses:             1488231
  Estimated Cycles:      8708825265

iai_fit_tf_idf
  Instructions:          4288819423
  L1 Accesses:           6143728883
  L2 Accesses:             31485644
  RAM Accesses:             1426626
  Estimated Cycles:      6351089013

iai_fit_transform_tf_idf
  Instructions:           869383776
  L1 Accesses:           1202958638
  L2 Accesses:              2996272
  RAM Accesses:              147146
  Estimated Cycles:      1223090108

iai_pca_bench
  Instructions:          4503080726
  L1 Accesses:           5984882444
  L2 Accesses:            606301143
  RAM Accesses:             4366308
  Estimated Cycles:      9169208939

iai_zca_bench
  Instructions:           593669460
  L1 Accesses:            745426778
  L2 Accesses:             20078499
  RAM Accesses:             3345546
  Estimated Cycles:       962913383

iai_cholesky_bench
  Instructions:           517218067
  L1 Accesses:            641365471
  L2 Accesses:             19722717
  RAM Accesses:             3300250
  Estimated Cycles:       855487806

cargo bench -p linfa-preprocessing -q

running 53 tests
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
test result: ok. 0 passed; 0 failed; 53 ignored; 0 measured; 0 filtered out; finished in 0.01s

iai_standard_scaler_bench
  Instructions:           247707707 (+0.024685%)
  L1 Accesses:            313683265 (+0.025857%)
  L2 Accesses:              6242496 (-0.004485%)
  RAM Accesses:             1076176 (+0.117498%)
  Estimated Cycles:       382561905 (+0.032395%)

iai_min_max_scaler_bench
  Instructions:           183232614 (+0.033374%)
  L1 Accesses:            216728467 (+0.037104%)
  L2 Accesses:              7320099 (+0.008580%)
  RAM Accesses:             1123973 (+0.094041%)
  Estimated Cycles:       292668017 (+0.041185%)

iai_max_abs_scaler_bench
  Instructions:           184320331 (+0.033177%)
  L1 Accesses:            194024734 (+0.041837%)
  L2 Accesses:              9668647 (-0.002751%)
  RAM Accesses:             1024939 (+0.116826%)
  Estimated Cycles:       278240834 (+0.043747%)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

iai_fit_vectorizer
  Instructions:          4289254656 (-0.301018%)
  L1 Accesses:           6143589442 (-0.265049%)
  L2 Accesses:             30834301 (-1.785174%)
  RAM Accesses:             1429386 (-1.773712%)
  Estimated Cycles:      6347789457 (-0.314589%)

iai_fit_transform_vectorizer
  Instructions:          5958146925 (-0.211742%)
  L1 Accesses:           8483597779 (-0.211798%)
  L2 Accesses:             32930389 (-1.922160%)
  RAM Accesses:             1478958 (-2.444305%)
  Estimated Cycles:      8700013254 (-0.258297%)

iai_fit_tf_idf
  Instructions:          4289575744 (-0.189165%)
  L1 Accesses:           6144040467 (-0.200605%)
  L2 Accesses:             30837261 (-2.111559%)
  RAM Accesses:             1428415 (-1.787255%)
  Estimated Cycles:      6348221297 (-0.260586%)

iai_fit_transform_tf_idf
  Instructions:          5960889325 (+578.7082%)
  L1 Accesses:           8486540657 (+598.1243%)
  L2 Accesses:             33019866 (+995.8834%)
  RAM Accesses:             1502558 (+758.9531%)
  Estimated Cycles:      8704229517 (+603.7655%)

iai_pca_bench
  Instructions:          4617822144 (+2.836558%)
  L1 Accesses:           5755795774 (-3.532371%)
  L2 Accesses:            612619759 (+1.044968%)
  RAM Accesses:             4296011 (-1.009489%)
  Estimated Cycles:      8969254954 (-1.973983%)

iai_zca_bench
  Instructions:           608404933 (+4.710235%)
  L1 Accesses:            741514513 (+1.982326%)
  L2 Accesses:             25116941 (+25.19884%)
  RAM Accesses:             3292475 (-0.800950%)
  Estimated Cycles:       982335843 (+4.107730%)

iai_cholesky_bench
  Instructions:           536186203 (+6.262738%)
  L1 Accesses:            646261524 (+3.727183%)
  L2 Accesses:             24747630 (+25.58514%)
  RAM Accesses:             3277475 (+0.113386%)
  Estimated Cycles:       884711299 (+5.807638%)

Also for linfa_linear is the expectation that the written benchmark just use the diabetes dataset (n=1599)? Or do we want to do something akin to a small,mediu, large benchmark say something like n=:1_000, 10_000, 100_000?

I imagine the latter would allow us to know if the blas non-blas tradeoff is also a function of sample size.

We should go with the latter.

BTW, for the linfa-clustering benchmarks, the "previous benchmark" is the one with BLAS enabled right?

We should go with the latter.

BTW, for the linfa-clustering benchmarks, the "previous benchmark" is the one with BLAS enabled right?

Okay I'll start a benchmarking PR for linfa_linear soon with iai per #103

That is correct for linfa-clustering the first benchmark is with blas enabled and the second is without.

linfa-ica benchmarks

Context

Laptop mid charge level with wall charger power plugged in
Laptop not in use during test
Laptop not over heating

Run: cargo bench -p linfa-ica

     Running benches\fast_ica.rs (target\release\deps\fast_ica-d0c3b5752e019610.exe)
Fast ICA/GFunc_Cube/1000
                        time:   [63.292 µs 63.701 µs 64.270 µs]
Found 24 outliers among 100 measurements (24.00%)
  22 (22.00%) high mild
  2 (2.00%) high severe
Fast ICA/GFunc_Cube/10000
                        time:   [594.96 µs 599.06 µs 603.51 µs]
Found 6 outliers among 100 measurements (6.00%)
  5 (5.00%) high mild
  1 (1.00%) high severe
Fast ICA/GFunc_Cube/100000
                        time:   [10.097 ms 10.163 ms 10.236 ms]
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe

Fast ICA/GFunc_Logcosh/1000
                        time:   [80.528 µs 80.737 µs 80.965 µs]
Found 8 outliers among 100 measurements (8.00%)
  1 (1.00%) low severe
  5 (5.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Fast ICA/GFunc_Logcosh/10000
                        time:   [790.09 µs 792.99 µs 796.68 µs]
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) low mild
  7 (7.00%) high mild
  5 (5.00%) high severe
Fast ICA/GFunc_Logcosh/100000
                        time:   [11.175 ms 11.241 ms 11.311 ms]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

Fast ICA/Exp/1000       time:   [95.226 µs 95.656 µs 96.168 µs]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe
Fast ICA/Exp/10000      time:   [930.76 µs 935.38 µs 941.12 µs]
Found 12 outliers among 100 measurements (12.00%)
  1 (1.00%) low severe
  1 (1.00%) low mild
  7 (7.00%) high mild
  3 (3.00%) high severe
Fast ICA/Exp/100000     time:   [18.221 ms 18.381 ms 18.559 ms]
Found 9 outliers among 100 measurements (9.00%)
  2 (2.00%) high mild
  7 (7.00%) high severe

Run cargo bench -p linfa-ica -F intel-mkl-static

     Running benches\fast_ica.rs (target\release\deps\fast_ica-2e03c80ee39d2cab.exe)
Fast ICA/GFunc_Cube/1000
                        time:   [51.978 µs 52.182 µs 52.431 µs]
                        change: [-35.893% -31.248% -26.694%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe
Fast ICA/GFunc_Cube/10000
                        time:   [466.80 µs 469.63 µs 473.14 µs]
                        change: [-21.665% -20.949% -20.232%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) low mild
  6 (6.00%) high mild
  2 (2.00%) high severe
Fast ICA/GFunc_Cube/100000
                        time:   [9.2229 ms 9.3005 ms 9.3897 ms]
                        change: [-9.4661% -8.4831% -7.3959%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

Fast ICA/GFunc_Logcosh/1000
                        time:   [69.136 µs 69.490 µs 69.884 µs]
                        change: [-13.958% -13.385% -12.795%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe
Fast ICA/GFunc_Logcosh/10000
                        time:   [672.37 µs 675.86 µs 680.20 µs]
                        change: [-15.858% -14.933% -13.922%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 20 outliers among 100 measurements (20.00%)
  2 (2.00%) low mild
  4 (4.00%) high mild
  14 (14.00%) high severe
Fast ICA/GFunc_Logcosh/100000
                        time:   [10.314 ms 10.412 ms 10.527 ms]
                        change: [-8.3714% -7.3728% -6.1955%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) high mild
  5 (5.00%) high severe

Fast ICA/Exp/1000       time:   [76.481 µs 76.798 µs 77.180 µs]
                        change: [-20.431% -19.929% -19.446%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  4 (4.00%) high mild
  9 (9.00%) high severe
Fast ICA/Exp/10000      time:   [708.61 µs 714.43 µs 721.13 µs]
                        change: [-23.322% -22.474% -21.608%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
Fast ICA/Exp/100000     time:   [16.429 ms 16.604 ms 16.795 ms]
                        change: [-10.997% -9.6641% -8.3325%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe

criterion.zip

The zip file is also attached. Will dump flamegraph in #262. criterion.zip

Context

Laptop fully charged
Laptop not in use during test
Laptop not over heating
Laptop plugged in

System specs:

hardware:

Processor   12th Gen Intel(R) Core(TM) i5-1235U   1.30 GHz
Installed RAM   8.00 GB (7.72 GB usable)
Device ID   5F0BF67B-026D-4E93-8F42-D89620D51451
Product ID  00342-20909-03914-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch   Pen and touch support with 10 touch points

OS:

Edition Windows 11 Home
Version 22H2
Installed on    ‎9/‎29/‎2022
OS build    22621.674
Experience  Windows Feature Experience Pack 1000.22634.1000.0

cargo bench -p linfa-ica -q

oojo12@femi-device:/mnt/c/Users/femio/Documents/linfa$ cargo bench -p linfa-ica -q
running 6 tests
iiiiii
test result: ok. 0 passed; 0 failed; 6 ignored; 0 measured; 0 filtered out; finished in 0.00s

Fast ICA/GFunc_Cube/1000
                        time:   [63.201 µs 63.223 µs 63.246 µs]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe
Fast ICA/GFunc_Cube/10000
                        time:   [556.67 µs 557.19 µs 557.86 µs]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Fast ICA/GFunc_Cube/100000
                        time:   [10.315 ms 10.340 ms 10.370 ms]
Found 7 outliers among 100 measurements (7.00%)
  5 (5.00%) high mild
  2 (2.00%) high severe

Fast ICA/GFunc_Logcosh/1000
                        time:   [93.606 µs 94.643 µs 95.681 µs]
Found 14 outliers among 100 measurements (14.00%)
  12 (12.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high severe
Fast ICA/GFunc_Logcosh/10000
                        time:   [932.63 µs 932.98 µs 933.41 µs]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Fast ICA/GFunc_Logcosh/100000
                        time:   [12.501 ms 12.526 ms 12.555 ms]
Found 10 outliers among 100 measurements (10.00%)
  7 (7.00%) high mild
  3 (3.00%) high severe

Fast ICA/Exp/1000       time:   [94.891 µs 94.935 µs 94.982 µs]
Found 7 outliers among 100 measurements (7.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  3 (3.00%) high severe
Fast ICA/Exp/10000      time:   [809.34 µs 809.89 µs 810.59 µs]
Found 14 outliers among 100 measurements (14.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  9 (9.00%) high severe
Fast ICA/Exp/100000     time:   [17.182 ms 17.236 ms 17.294 ms]
Found 11 outliers among 100 measurements (11.00%)
  6 (6.00%) high mild
  5 (5.00%) high severe

cargo bench -p linfa-ica -q -F intel-mkl-static

oojo12@femi-device:/mnt/c/Users/femio/Documents/linfa$ cargo bench -p linfa-ica -q -F intel-mkl-static
running 6 tests
iiiiii
test result: ok. 0 passed; 0 failed; 6 ignored; 0 measured; 0 filtered out; finished in 0.00s

Fast ICA/GFunc_Cube/1000
                        time:   [54.565 µs 54.610 µs 54.665 µs]
                        change: [-13.726% -13.591% -13.443%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) high mild
  7 (7.00%) high severe
Fast ICA/GFunc_Cube/10000
                        time:   [469.52 µs 473.98 µs 478.95 µs]
                        change: [-16.749% -16.201% -15.667%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe
Fast ICA/GFunc_Cube/100000
                        time:   [10.505 ms 10.600 ms 10.699 ms]
                        change: [+1.6209% +2.5106% +3.6220%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 13 outliers among 100 measurements (13.00%)
  13 (13.00%) high mild

Fast ICA/GFunc_Logcosh/1000
                        time:   [91.057 µs 92.249 µs 93.431 µs]
                        change: [-6.2060% -5.4401% -4.7246%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 22 outliers among 100 measurements (22.00%)
  1 (1.00%) low mild
  4 (4.00%) high mild
  17 (17.00%) high severe
Fast ICA/GFunc_Logcosh/10000
                        time:   [916.11 µs 919.42 µs 922.88 µs]
                        change: [-1.3843% -1.0460% -0.6588%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Fast ICA/GFunc_Logcosh/100000
                        time:   [13.872 ms 14.020 ms 14.171 ms]
                        change: [+10.756% +11.932% +13.114%] (p = 0.00 < 0.05)
                        Performance has regressed.

Fast ICA/Exp/1000       time:   [84.897 µs 85.525 µs 86.097 µs]
                        change: [-9.5933% -9.2310% -8.8506%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 9 outliers among 100 measurements (9.00%)
  5 (5.00%) low severe
  2 (2.00%) low mild
  2 (2.00%) high mild
Fast ICA/Exp/10000      time:   [713.15 µs 715.78 µs 718.45 µs]
                        change: [-11.707% -11.404% -11.114%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
Fast ICA/Exp/100000     time:   [19.594 ms 19.828 ms 20.067 ms]
                        change: [+13.723% +15.042% +16.547%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild

cargo bench -p linfa-linear -q

oojo12@femi-device:/mnt/c/Users/femio/Documents/linfa$ cargo bench -p linfa-linear -q

running 69 tests
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
test result: ok. 0 passed; 0 failed; 69 ignored; 0 measured; 0 filtered out; finished in 0.00s

Linfa_linear/OLS-5Feats/1000
                        time:   [15.159 µs 15.213 µs 15.279 µs]
Linfa_linear/GLM-5Feats/1000
                        time:   [352.49 µs 352.81 µs 353.17 µs]
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Linfa_linear/OLS-5Feats/10000
                        time:   [180.66 µs 181.26 µs 181.95 µs]
Linfa_linear/GLM-5Feats/10000
                        time:   [3.3059 ms 3.3098 ms 3.3143 ms]
Found 9 outliers among 100 measurements (9.00%)
  6 (6.00%) high mild
  3 (3.00%) high severe
Linfa_linear/OLS-5Feats/100000
                        time:   [3.2894 ms 3.3143 ms 3.3397 ms]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) low mild
  4 (4.00%) high mild
  1 (1.00%) high severe
Benchmarking Linfa_linear/GLM-5Feats/100000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.2s, or reduce sample count to 40.
Linfa_linear/GLM-5Feats/100000
                        time:   [101.81 ms 102.74 ms 103.75 ms]
Found 4 outliers among 100 measurements (4.00%)
  4 (4.00%) high mild
Linfa_linear/OLS-10Feats/100000
                        time:   [11.356 ms 11.464 ms 11.571 ms]
Benchmarking Linfa_linear/GLM-10Feats/100000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 9.6s, or reduce sample count to 50.
Linfa_linear/GLM-10Feats/100000
                        time:   [95.747 ms 96.903 ms 98.005 ms]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) low mild

cargo bench -p linfa-linear -q -F intel-mkl-static

oojo12@femi-device:/mnt/c/Users/femio/Documents/linfa$ cargo bench -p linfa-linear -q -F intel-mkl-static
running 69 tests
iiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiiii
test result: ok. 0 passed; 0 failed; 69 ignored; 0 measured; 0 filtered out; finished in 0.00s

Linfa_linear/OLS-5Feats/1000
                        time:   [13.778 µs 13.828 µs 13.864 µs]
                        change: [-13.505% -12.651% -11.689%] (p = 0.00 < 0.05)
                        Performance has improved.
Linfa_linear/GLM-5Feats/1000
                        time:   [295.62 µs 296.73 µs 298.11 µs]
                        change: [-14.818% -14.391% -13.879%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Linfa_linear/OLS-5Feats/10000
                        time:   [168.71 µs 168.78 µs 168.87 µs]
                        change: [-7.9615% -7.6014% -7.2511%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  4 (4.00%) high mild
  1 (1.00%) high severe
Linfa_linear/GLM-5Feats/10000
                        time:   [2.2417 ms 2.2426 ms 2.2436 ms]
                        change: [-32.339% -32.244% -32.159%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  6 (6.00%) high mild
Linfa_linear/OLS-5Feats/100000
                        time:   [2.7646 ms 2.7759 ms 2.7892 ms]
                        change: [-16.990% -16.246% -15.524%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Benchmarking Linfa_linear/GLM-5Feats/100000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 15.8s, or reduce sample count to 30.
Linfa_linear/GLM-5Feats/100000
                        time:   [154.41 ms 155.21 ms 156.07 ms]
                        change: [+49.403% +51.072% +52.729%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Linfa_linear/OLS-10Feats/100000
                        time:   [9.6127 ms 9.6585 ms 9.7072 ms]
                        change: [-16.607% -15.750% -14.808%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Benchmarking Linfa_linear/GLM-10Feats/100000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 10.5s, or reduce sample count to 40.
Linfa_linear/GLM-10Feats/100000
                        time:   [104.48 ms 105.18 ms 105.95 ms]
                        change: [+7.1473% +8.5397% +10.042%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

cargo bench -p linfa-pls -q

oojo12@femi-device:/mnt/c/Users/femio/Documents/linfa$ cargo bench -p linfa-pls -q
running 25 tests
iiiiiiiiiiiiiiiiiiiiiiiii
test result: ok. 0 passed; 0 failed; 25 ignored; 0 measured; 0 filtered out; finished in 0.00s

Linfa_pls/Regression-Nipals-5Feats/1000
                        time:   [208.09 µs 208.19 µs 208.33 µs]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Linfa_pls/Canonical-Nipals-5Feats/1000
                        time:   [7.9120 ns 7.9683 ns 8.0416 ns]
Linfa_pls/Cca-Nipals-5Feats/1000
                        time:   [8.1832 ns 8.3129 ns 8.4271 ns]
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe
Linfa_pls/Regression-Nipals-5Feats/10000
                        time:   [2.0402 ms 2.0407 ms 2.0413 ms]
Found 9 outliers among 100 measurements (9.00%)
  4 (4.00%) high mild
  5 (5.00%) high severe
Linfa_pls/Canonical-Nipals-5Feats/10000
                        time:   [7.8653 ns 7.8872 ns 7.9119 ns]
Found 15 outliers among 100 measurements (15.00%)
  9 (9.00%) high mild
  6 (6.00%) high severe
Linfa_pls/Cca-Nipals-5Feats/10000
                        time:   [7.8686 ns 7.8951 ns 7.9242 ns]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) high mild
  8 (8.00%) high severe
Linfa_pls/Regression-Nipals-5Feats/100000
                        time:   [25.019 ms 25.074 ms 25.137 ms]
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe
Linfa_pls/Canonical-Nipals-5Feats/100000
                        time:   [8.3477 ns 8.4512 ns 8.5605 ns]
Found 23 outliers among 100 measurements (23.00%)
  19 (19.00%) low severe
  4 (4.00%) low mild
Linfa_pls/Cca-Nipals-5Feats/100000
                        time:   [8.8812 ns 8.9391 ns 9.0124 ns]
Found 22 outliers among 100 measurements (22.00%)
  1 (1.00%) low mild
  5 (5.00%) high mild
  16 (16.00%) high severe
Linfa_pls/Regression-Nipals-10Feats/100000
                        time:   [44.065 ms 45.090 ms 46.250 ms]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
Linfa_pls/Canonical-Nipals-10Feats/100000
                        time:   [8.9908 ns 9.0708 ns 9.1734 ns]
Linfa_pls/Cca-Nipals-10Feats/100000
                        time:   [9.5099 ns 9.6530 ns 9.8343 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe
Linfa_pls/Regression-Svd-5Feats/1000
                        time:   [213.69 µs 215.88 µs 218.25 µs]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe
Linfa_pls/Canonical-Svd-5Feats/1000
                        time:   [9.7439 ns 10.166 ns 10.639 ns]
Found 15 outliers among 100 measurements (15.00%)
  2 (2.00%) high mild
  13 (13.00%) high severe
Linfa_pls/Cca-Svd-5Feats/1000
                        time:   [10.236 ns 10.311 ns 10.392 ns]
Found 5 outliers among 100 measurements (5.00%)
  3 (3.00%) high mild
  2 (2.00%) high severe
Linfa_pls/Regression-Svd-5Feats/10000
                        time:   [2.1973 ms 2.2315 ms 2.2708 ms]
Found 7 outliers among 100 measurements (7.00%)
  4 (4.00%) high mild
  3 (3.00%) high severe
Linfa_pls/Canonical-Svd-5Feats/10000
                        time:   [8.8178 ns 8.8215 ns 8.8253 ns]
Found 8 outliers among 100 measurements (8.00%)
  3 (3.00%) low mild
  2 (2.00%) high mild
  3 (3.00%) high severe
Linfa_pls/Cca-Svd-5Feats/10000
                        time:   [8.8147 ns 8.8184 ns 8.8233 ns]
Found 10 outliers among 100 measurements (10.00%)
  2 (2.00%) low severe
  4 (4.00%) low mild
  2 (2.00%) high mild
  2 (2.00%) high severe
Linfa_pls/Regression-Svd-5Feats/100000
                        time:   [24.624 ms 24.814 ms 24.990 ms]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) low severe
  2 (2.00%) low mild
Linfa_pls/Canonical-Svd-5Feats/100000
                        time:   [9.1704 ns 9.2239 ns 9.2828 ns]
Linfa_pls/Cca-Svd-5Feats/100000
                        time:   [9.1081 ns 9.1352 ns 9.1682 ns]
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild
Linfa_pls/Regression-Svd-10Feats/100000
                        time:   [36.395 ms 36.598 ms 36.812 ms]
Found 3 outliers among 100 measurements (3.00%)
  3 (3.00%) high mild
Linfa_pls/Canonical-Svd-10Feats/100000
                        time:   [9.2100 ns 9.2693 ns 9.3374 ns]
Linfa_pls/Cca-Svd-10Feats/100000
                        time:   [8.8536 ns 8.8783 ns 8.9081 ns]

cargo bench -p linfa-pls -q -F intel-mkl-static

oojo12@femi-device:/mnt/c/Users/femio/Documents/linfa$ cargo bench -p linfa-pls -q -F intel-mkl-static
running 25 tests
iiiiiiiiiiiiiiiiiiiiiiiii
test result: ok. 0 passed; 0 failed; 25 ignored; 0 measured; 0 filtered out; finished in 0.00s

Linfa_pls/Regression-Nipals-5Feats/1000
                        time:   [218.90 µs 221.13 µs 223.47 µs]
                        change: [+1.3774% +2.6556% +3.9910%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Canonical-Nipals-5Feats/1000
                        time:   [8.6595 ns 8.6959 ns 8.7395 ns]
                        change: [+5.9673% +7.1422% +8.3531%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) high mild
  2 (2.00%) high severe
Linfa_pls/Cca-Nipals-5Feats/1000
                        time:   [8.7229 ns 8.7770 ns 8.8314 ns]
                        change: [+7.1095% +8.1816% +9.2140%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
Linfa_pls/Regression-Nipals-5Feats/10000
                        time:   [2.1112 ms 2.1290 ms 2.1477 ms]
                        change: [+3.4751% +4.3265% +5.2747%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Canonical-Nipals-5Feats/10000
                        time:   [8.5796 ns 8.6079 ns 8.6421 ns]
                        change: [+8.3916% +8.8811% +9.3770%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) low severe
  3 (3.00%) low mild
  1 (1.00%) high mild
  11 (11.00%) high severe
Linfa_pls/Cca-Nipals-5Feats/10000
                        time:   [7.7817 ns 7.8567 ns 7.9529 ns]
                        change: [+2.0146% +3.2795% +4.5430%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Regression-Nipals-5Feats/100000
                        time:   [23.306 ms 23.380 ms 23.462 ms]
                        change: [-7.1240% -6.7554% -6.3878%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  9 (9.00%) high mild
  4 (4.00%) high severe
Linfa_pls/Canonical-Nipals-5Feats/100000
                        time:   [7.6522 ns 7.6820 ns 7.7210 ns]
                        change: [-9.0386% -7.7480% -6.4691%] (p = 0.00 < 0.05)
                        Performance has improved.
Linfa_pls/Cca-Nipals-5Feats/100000
                        time:   [7.6062 ns 7.6240 ns 7.6486 ns]
                        change: [-14.799% -14.298% -13.788%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 16 outliers among 100 measurements (16.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  11 (11.00%) high severe
Linfa_pls/Regression-Nipals-10Feats/100000
                        time:   [31.430 ms 31.724 ms 32.066 ms]
                        change: [-31.536% -29.642% -27.860%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  6 (6.00%) high mild
  11 (11.00%) high severe
Linfa_pls/Canonical-Nipals-10Feats/100000
                        time:   [8.5970 ns 8.7048 ns 8.8243 ns]
                        change: [-9.8506% -7.7853% -5.1734%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  3 (3.00%) low mild
  13 (13.00%) high mild
  5 (5.00%) high severe
Linfa_pls/Cca-Nipals-10Feats/100000
                        time:   [8.2053 ns 8.3359 ns 8.4771 ns]
                        change: [-26.655% -23.143% -19.891%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  14 (14.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  1 (1.00%) high severe
Linfa_pls/Regression-Svd-5Feats/1000
                        time:   [174.57 µs 174.64 µs 174.71 µs]
                        change: [-19.443% -18.597% -17.826%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 6 outliers among 100 measurements (6.00%)
  2 (2.00%) low mild
  3 (3.00%) high mild
  1 (1.00%) high severe
Linfa_pls/Canonical-Svd-5Feats/1000
                        time:   [7.6545 ns 7.6862 ns 7.7219 ns]
                        change: [-22.915% -20.565% -18.412%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  4 (4.00%) high mild
  4 (4.00%) high severe
Linfa_pls/Cca-Svd-5Feats/1000
                        time:   [7.6873 ns 7.7222 ns 7.7659 ns]
                        change: [-24.608% -23.624% -22.684%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking Linfa_pls/Regression-Svd-5Feats/10000: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 8.5s, enable flat sampling, or reduce sample count to 50.
Linfa_pls/Regression-Svd-5Feats/10000
                        time:   [1.6892 ms 1.6901 ms 1.6913 ms]
                        change: [-25.447% -24.119% -22.944%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  9 (9.00%) high mild
  5 (5.00%) high severe
Linfa_pls/Canonical-Svd-5Feats/10000
                        time:   [7.6419 ns 7.6752 ns 7.7131 ns]
                        change: [-13.385% -13.067% -12.712%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 17 outliers among 100 measurements (17.00%)
  5 (5.00%) high mild
  12 (12.00%) high severe
Linfa_pls/Cca-Svd-5Feats/10000
                        time:   [7.6413 ns 7.6644 ns 7.6900 ns]
                        change: [-13.127% -12.772% -12.346%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  7 (7.00%) high mild
  5 (5.00%) high severe
Linfa_pls/Regression-Svd-5Feats/100000
                        time:   [19.110 ms 19.183 ms 19.268 ms]
                        change: [-23.324% -22.693% -22.010%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 7 outliers among 100 measurements (7.00%)
  3 (3.00%) high mild
  4 (4.00%) high severe
Linfa_pls/Canonical-Svd-5Feats/100000
                        time:   [7.7722 ns 7.8342 ns 7.9111 ns]
                        change: [-13.952% -12.946% -11.929%] (p = 0.00 < 0.05)
                        Performance has improved.
Linfa_pls/Cca-Svd-5Feats/100000
                        time:   [7.6404 ns 7.6651 ns 7.6930 ns]
                        change: [-16.846% -16.515% -16.161%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 12 outliers among 100 measurements (12.00%)
  8 (8.00%) high mild
  4 (4.00%) high severe
Linfa_pls/Regression-Svd-10Feats/100000
                        time:   [28.623 ms 29.112 ms 29.658 ms]
                        change: [-21.816% -20.454% -18.806%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  1 (1.00%) high mild
  12 (12.00%) high severe
Linfa_pls/Canonical-Svd-10Feats/100000
                        time:   [7.6373 ns 7.6673 ns 7.7036 ns]
                        change: [-20.068% -19.433% -18.792%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe
Linfa_pls/Cca-Svd-10Feats/100000
                        time:   [7.7171 ns 7.7675 ns 7.8295 ns]
                        change: [-13.947% -13.029% -12.112%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 21 outliers among 100 measurements (21.00%)
  13 (13.00%) high mild
  8 (8.00%) high severe

Why are the ICA results you posted different from mine? In your post the discrepancy for ICA isn't as high.

Some thoughts

Criterion docs that contain some insights
Not a compiler expert, but we are using different machines so is it possible that what mine compiled to was different?
One of them could have been a spurious result. We can raise the significance level of the tests to make it more robust to this.
Context that you're running the tests matters see this

If your post is mine from the related PR then it just might be the subtle difference in the conditions between both were run, per 4.

It actually might be a good idea to raise the significance level to .99 or 1 since we don't care about small changes to performance but rather large ones that warrant the additional code complexity from maintain Blas support. Additionally, it might be best practice to run the benchmarks twice to offset the chance of a spurious result.

Quick question in the non-blas case we are using linfa-linalg correct? If so, has that package itself been benchmarked against the blas alternative? I'm just wondering if that is a more direct comparison getting to what we are interest in here.

Rationale:

It seems there are only 2 cases that would cause the performance difference between Blas and non-blas

The algorithmic complexity and space complexity difference between the two implementations
How the compiler optimizes the code

In my view it might be better to benchmark the rust implementation vs blas directly since it would be a dependency for future algorithms. This way if we imagine linfa-linalg has a more efficient svd implementation than blas we'd be confident that all algorithms using it instead of blas are going to perform better. The other approach of benchmarking the algorithms kinda makes it seem to me that we'd have to benchmark each alg against the blas vs non-blas version to be confident that performance doesn't take a hit. As opposed to being confident that anything using the svd non-blas implmentation will perform better than the blas version.

Scenarios

Further we can imagine two scenarios where the following holds: Algs: A, B, C, D, where only A and B are better than the blas version.

Scenario I - we happen to implement algorithms that rely on algs A and B. In this case we falsely rule that we can remove blas dependencies. Also, by removing we have a one-time benchmark to compare performance. The community would question this as the project evolves.

Scenario II - We happen to implement algorithms dependent on algs C and D. In this case we again falsely rule that blas performance is overall better. However, here we update the crate that is meant to replace blas and now have a benchmark in place to compare the two as releases are made. In this case we confidently release and remove blas dependency and can easily from release-to-release track and see if blas ever becomes substantially better over the long term (in such a case blas support would be warranted). Additionally, it could increase community adoption for the blas alternative.

Hope the explanations make sense.

The only difference between the first and second ICA benchmarks are the charge levels (first one is half power and second one is full power). I guess something like this can make a big difference. I'll update the ICA issue with the new results. I think what you want is to reduce the significance level, not increasing it, and I'm willing to accept that. You can also increase the noise threshold. Instead of running the benchmarks twice we can just increase the sample size and measurement time to run more iterations and increase confidence in our results. Running benchmarks twice just adds confusion when we get two different results.

So how about the below to increase confidence?

Up the confidence level to 97 (default 95) - yielding 97% confidence that the true runtime lies within our estimated confidence interval
Up measurement time to 10s (default 3) - time allotted to finish a sample however if it is to low Criterion will automatically increase it.
Up sample size to 200 (default 100) - number of samples in a run
Up warmup time to 10s (default 5s) - time given to comp to adjust to load
Increase noise threshold to 5% (changes less than 5% will be ignored. The default is .01)
Decrease significance level to .02 (default.05) - for hypothesis testing in criterions context the default means about 5% of identical benchmarks will be considered different due to noise.

If agreed I'll update the benchmarking section of the Contribute.md

Try running a few benchmarks with these changes. If they don't take too long, then we can accept them.

Regarding benchmarking linfa-linalg directly, if we were to actually do a general comparison between linfa-linalg and BLAS, then linfa-linalg will definitely be slower. BLAS has been micro-optimized to hell and back over multiple decades, so there's no way we'll catch up either. However, I'm only interested in the performance of the linfa ML algorithms, which only exercises a subset of the linalg algorithms with a subset of the possible inputs. Reaching performance parity on this subset of BLAS functionality is much easier than on the entirety of all BLAS usecases. As much as I'd like to say that all non-BLAS SVD algorithms will perform just as well as their BLAS counterparts, it's not going to happen with linfa-linalg.

New algorithms using linalg routines may perform worse without BLAS than with BLAS. Practically speaking, after BLAS removal of existing algorithms, we won't care about BLAS performance of new algorithms. Rather, we'll use benchmarks and profiling to improve the non-BLAS performance to an acceptable level. This is acceptable because most new algorithms don't use BLAS at all, so it's unlikely that we'll run into this problem in the first place.

Gotcha that makes sense. As it pertains to the new settings here are the results. Albeit I upped the measurement time to 15 and still got a warning message about either reducing sample size or increasing it slightly for one of the parameterized benchmarks.

Normal - 1min 39secs Robust - 3min 38secs

That warning disappeared at 20 and doesn't appear if I drop sample size down to 150 and keep measurement time at 15.

Testing was done with linfa-ica

I don't really care about that warning. We can keep the sample size and measurement time the same.

Can you make a PR for the benchmark changes so we don't need to discuss them here?

Context

Laptop fully charged
Laptop not in use during test
Laptop not over heating
Laptop plugged in

System specs:

hardware:

Processor   12th Gen Intel(R) Core(TM) i5-1235U   1.30 GHz
Installed RAM   8.00 GB (7.72 GB usable)
Device ID   5F0BF67B-026D-4E93-8F42-D89620D51451
Product ID  00342-20909-03914-AAOEM
System type 64-bit operating system, x64-based processor
Pen and touch   Pen and touch support with 10 touch points

OS:

Edition Windows 11 Home
Version 22H2
Installed on    ‎9/‎29/‎2022
OS build    22621.674
Experience  Windows Feature Experience Pack 1000.22634.1000.0

Algorithms

First run was without blas second run was with blas for both algorithm results depcited below. Also ran with the update criterion branch.

Linfa PLS

Linfa_pls/Regression-Nipals-5Feats/1000
                        time:   [344.24 µs 357.38 µs 373.74 µs]
                        change: [+36.794% +44.166% +53.327%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Canonical-Nipals-5Feats/1000
                        time:   [9.6842 ns 9.7007 ns 9.7215 ns]
                        change: [-5.3093% -4.9069% -4.5169%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 33 outliers among 200 measurements (16.50%)
  11 (5.50%) high mild
  22 (11.00%) high severe
Linfa_pls/Cca-Nipals-5Feats/1000
                        time:   [9.6890 ns 9.7024 ns 9.7189 ns]
                        change: [-5.7148% -4.8418% -3.7950%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 26 outliers among 200 measurements (13.00%)
  9 (4.50%) high mild
  17 (8.50%) high severe
Linfa_pls/Regression-Nipals-5Feats/10000
                        time:   [3.1733 ms 3.1816 ms 3.1909 ms]
                        change: [-7.0568% -6.7312% -6.4065%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 200 measurements (5.50%)
  5 (2.50%) high mild
  6 (3.00%) high severe
Linfa_pls/Canonical-Nipals-5Feats/10000
                        time:   [12.144 ns 12.381 ns 12.578 ns]
                        change: [+6.0338% +8.4908% +11.165%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Cca-Nipals-5Feats/10000
                        time:   [10.821 ns 11.124 ns 11.472 ns]
                        change: [+17.739% +20.082% +22.441%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Regression-Nipals-5Feats/100000
                        time:   [44.548 ms 47.182 ms 49.851 ms]
                        change: [+20.101% +26.879% +33.073%] (p = 0.00 < 0.05)
                        Performance has regressed.
Linfa_pls/Canonical-Nipals-5Feats/100000
                        time:   [13.124 ns 13.139 ns 13.156 ns]
                        change: [+26.855% +27.495% +28.096%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 15 outliers among 200 measurements (7.50%)
  1 (0.50%) low severe
  1 (0.50%) low mild
  1 (0.50%) high mild
  12 (6.00%) high severe
Linfa_pls/Cca-Nipals-5Feats/100000
                        time:   [9.6890 ns 9.7102 ns 9.7384 ns]
                        change: [-5.7419% -5.2295% -4.7242%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 26 outliers among 200 measurements (13.00%)
  9 (4.50%) high mild
  17 (8.50%) high severe
Benchmarking Linfa_pls/Regression-Nipals-10Feats/100000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 12.1s, or reduce sample count to 160.
Linfa_pls/Regression-Nipals-10Feats/100000
                        time:   [82.851 ms 82.988 ms 83.140 ms]
                        change: [+66.261% +66.774% +67.292%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 8 outliers among 200 measurements (4.00%)
  4 (2.00%) high mild
  4 (2.00%) high severe
Linfa_pls/Canonical-Nipals-10Feats/100000
                        time:   [10.945 ns 11.265 ns 11.627 ns]
                        change: [+18.124% +20.740% +22.820%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 58 outliers among 200 measurements (29.00%)
  46 (23.00%) low severe
  1 (0.50%) low mild
  4 (2.00%) high mild
  7 (3.50%) high severe
Linfa_pls/Cca-Nipals-10Feats/100000
                        time:   [9.6863 ns 9.6991 ns 9.7153 ns]
                        change: [-5.5358% -4.9845% -4.4494%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 28 outliers among 200 measurements (14.00%)
  8 (4.00%) high mild
  20 (10.00%) high severe
Linfa_pls/Regression-Svd-5Feats/1000
                        time:   [292.50 µs 293.19 µs 294.09 µs]
                        change: [-9.4257% -8.4861% -7.6607%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 200 measurements (5.00%)
  8 (4.00%) high mild
  2 (1.00%) high severe
Linfa_pls/Canonical-Svd-5Feats/1000
                        time:   [9.6979 ns 9.7179 ns 9.7438 ns]
                        change: [-10.038% -8.2813% -6.4957%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 24 outliers among 200 measurements (12.00%)
  8 (4.00%) high mild
  16 (8.00%) high severe
Linfa_pls/Cca-Svd-5Feats/1000
                        time:   [9.6933 ns 9.7073 ns 9.7240 ns]
                        change: [-13.856% -11.941% -10.039%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 25 outliers among 200 measurements (12.50%)
  12 (6.00%) high mild
  13 (6.50%) high severe
Linfa_pls/Regression-Svd-5Feats/10000
                        time:   [2.8835 ms 2.8900 ms 2.8975 ms]
                        change: [-37.503% -33.962% -30.066%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 200 measurements (5.50%)
  4 (2.00%) high mild
  7 (3.50%) high severe
Linfa_pls/Canonical-Svd-5Feats/10000
                        time:   [9.6885 ns 9.7073 ns 9.7301 ns]
                        change: [-7.8209% -5.4202% -4.1597%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 32 outliers among 200 measurements (16.00%)
  8 (4.00%) high mild
  24 (12.00%) high severe
Linfa_pls/Cca-Svd-5Feats/10000
                        time:   [9.6913 ns 9.7106 ns 9.7336 ns]
                        change: [-4.8600% -4.3851% -3.9269%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 30 outliers among 200 measurements (15.00%)
  11 (5.50%) high mild
  19 (9.50%) high severe
Linfa_pls/Regression-Svd-5Feats/100000
                        time:   [32.291 ms 32.372 ms 32.466 ms]
                        change: [-5.3005% -4.7793% -4.2841%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 200 measurements (7.00%)
  7 (3.50%) high mild
  7 (3.50%) high severe
Linfa_pls/Canonical-Svd-5Feats/100000
                        time:   [9.6894 ns 9.7054 ns 9.7258 ns]
                        change: [-4.9622% -4.4433% -3.9503%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 30 outliers among 200 measurements (15.00%)
  9 (4.50%) high mild
  21 (10.50%) high severe
Linfa_pls/Cca-Svd-5Feats/100000
                        time:   [9.6950 ns 9.7104 ns 9.7292 ns]
                        change: [-13.648% -11.669% -9.7167%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 200 measurements (11.50%)
  3 (1.50%) high mild
  20 (10.00%) high severe
Linfa_pls/Regression-Svd-10Feats/100000
                        time:   [52.774 ms 55.614 ms 58.474 ms]
                        change: [-24.889% -19.201% -13.529%] (p = 0.00 < 0.05)
                        Performance has improved.
Linfa_pls/Canonical-Svd-10Feats/100000
                        time:   [9.6883 ns 9.7062 ns 9.7283 ns]
                        change: [-5.8813% -4.9856% -4.2806%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 27 outliers among 200 measurements (13.50%)
  6 (3.00%) high mild
  21 (10.50%) high severe
Linfa_pls/Cca-Svd-10Feats/100000
                        time:   [9.6895 ns 9.7063 ns 9.7268 ns]
                        change: [-5.8981% -4.8350% -3.8458%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 29 outliers among 200 measurements (14.50%)
  10 (5.00%) high mild
  19 (9.50%) high severe

Linfa-linear

Linfa_linear/OLS-5Feats/1000
                        time:   [18.425 µs 20.095 µs 22.118 µs]
                        change: [+81.182% +91.628% +104.40%] (p = 0.00 < 0.02)
                        Performance has regressed.
Linfa_linear/GLM-5Feats/1000
                        time:   [347.05 µs 375.86 µs 406.45 µs]
                        change: [+41.988% +47.201% +51.446%] (p = 0.00 < 0.02)
                        Performance has regressed.
Found 49 outliers among 200 measurements (24.50%)
  31 (15.50%) low severe
  7 (3.50%) high mild
  11 (5.50%) high severe
Linfa_linear/OLS-5Feats/10000
                        time:   [175.92 µs 182.05 µs 190.11 µs]
                        change: [-3.1880% -0.9899% +1.6559%] (p = 0.45 > 0.02)
                        No change in performance detected.
Found 7 outliers among 200 measurements (3.50%)
  2 (1.00%) high mild
  5 (2.50%) high severe
Linfa_linear/GLM-5Feats/10000
                        time:   [2.1292 ms 2.1361 ms 2.1462 ms]
                        change: [-23.912% -23.612% -23.237%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 14 outliers among 200 measurements (7.00%)
  10 (5.00%) high mild
  4 (2.00%) high severe
Linfa_linear/OLS-5Feats/100000
                        time:   [5.7795 ms 5.9988 ms 6.2028 ms]
                        change: [+66.304% +72.788% +79.367%] (p = 0.00 < 0.02)
                        Performance has regressed.
Found 50 outliers among 200 measurements (25.00%)
  48 (24.00%) low severe
  1 (0.50%) low mild
  1 (0.50%) high mild
Benchmarking Linfa_linear/GLM-5Feats/100000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 34.3s, or reduce sample count to 50.
Linfa_linear/GLM-5Feats/100000
                        time:   [124.87 ms 132.96 ms 141.01 ms]
                        change: [+120.60% +134.56% +148.17%] (p = 0.00 < 0.02)
                        Performance has regressed.
Linfa_linear/OLS-10Feats/100000
                        time:   [11.413 ms 11.437 ms 11.465 ms]
                        change: [+17.779% +18.446% +19.099%] (p = 0.00 < 0.02)
                        Performance has regressed.
Found 21 outliers among 200 measurements (10.50%)
  11 (5.50%) high mild
  10 (5.00%) high severe
Linfa_linear/GLM-10Feats/100000
                        time:   [33.518 ms 33.642 ms 33.771 ms]
                        change: [-93.703% -93.651% -93.603%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 5 outliers among 200 measurements (2.50%)
  5 (2.50%) high mild

Linfa-Ica

running 6 tests
iiiiii
test result: ok. 0 passed; 0 failed; 6 ignored; 0 measured; 0 filtered out; finished in 0.00s

Fast ICA/GFunc_Cube/1000
                        time:   [55.927 µs 56.518 µs 57.137 µs]
                        change: [-11.596% -10.955% -10.236%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 10 outliers among 200 measurements (5.00%)
  4 (2.00%) low mild
  2 (1.00%) high mild
  4 (2.00%) high severe
Benchmarking Fast ICA/GFunc_Cube/10000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 11.6s, enable flat sampling, or reduce sample count to 130.
Fast ICA/GFunc_Cube/10000
                        time:   [575.12 µs 578.19 µs 581.59 µs]
                        change: [-3.3082% -2.3491% -1.2956%] (p = 0.00 < 0.02)
                        Change within noise threshold.
Found 16 outliers among 200 measurements (8.00%)
  7 (3.50%) high mild
  9 (4.50%) high severe
Fast ICA/GFunc_Cube/100000
                        time:   [12.045 ms 12.805 ms 13.586 ms]
                        change: [+21.924% +29.388% +36.396%] (p = 0.00 < 0.02)
                        Performance has regressed.

Fast ICA/GFunc_Logcosh/1000
                        time:   [66.099 µs 66.328 µs 66.647 µs]
                        change: [-15.872% -15.317% -14.640%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 30 outliers among 200 measurements (15.00%)
  10 (5.00%) high mild
  20 (10.00%) high severe
Benchmarking Fast ICA/GFunc_Logcosh/10000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 13.6s, enable flat sampling, or reduce sample count to 120.
Fast ICA/GFunc_Logcosh/10000
                        time:   [665.90 µs 668.46 µs 671.50 µs]
                        change: [-16.326% -15.672% -14.836%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 19 outliers among 200 measurements (9.50%)
  7 (3.50%) high mild
  12 (6.00%) high severe
Fast ICA/GFunc_Logcosh/100000
                        time:   [17.536 ms 18.243 ms 18.942 ms]
                        change: [+63.337% +69.969% +76.014%] (p = 0.00 < 0.02)
                        Performance has regressed.

Fast ICA/Exp/1000       time:   [77.234 µs 77.775 µs 78.350 µs]
                        change: [-19.189% -18.694% -18.242%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 3 outliers among 200 measurements (1.50%)
  3 (1.50%) high mild
Benchmarking Fast ICA/Exp/10000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 13.6s, enable flat sampling, or reduce sample count to 120.
Fast ICA/Exp/10000      time:   [679.81 µs 695.72 µs 715.86 µs]
                        change: [-25.587% -22.674% -19.380%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 30 outliers among 200 measurements (15.00%)
  9 (4.50%) high mild
  21 (10.50%) high severe
Fast ICA/Exp/100000     time:   [24.578 ms 24.609 ms 24.647 ms]
                        change: [+43.108% +43.945% +44.646%] (p = 0.00 < 0.02)
                        Performance has regressed.
Found 11 outliers among 200 measurements (5.50%)
  6 (3.00%) high mild
  5 (2.50%) high severe

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

These new benchmark results look strange. Some of the benchmarks have around a +40% runtime, which makes it seem like BLAS is a lot slower.

hmmm I reran it as a sanity check with the following changes to get the results below:

Disabled a few start-up applications
Switched mode to performance mode when on plug-in power source instead of balanced mode
Restarted my computer + waited a lil while for bootup process to finish
Every so often monitored the CPU/Disk/Memory during benchmarks. Nothing wild there, the usage was pretty consistent across all benches

New results

criterion.zip

Just noticed the above doesn't contain graphs. If you'd still like those I can rerun with cargo-criterion to get it and update the benchmarking section of the contribute.md

Linfa-ica

Benchmarking Fast ICA/GFunc_Cube/1000: Collecting 200 samples in estimated 10.839 s (201k itFast ICA/GFunc_Cube/1000
                        time:   [53.628 µs 54.549 µs 55.637 µs]
                        change: [-18.316% -15.567% -12.326%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 30 outliers among 200 measurements (15.00%)
  13 (6.50%) high mild
  17 (8.50%) high severe
Benchmarking Fast ICA/GFunc_Cube/10000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 11.1s, enable flat sampling, or reduce sample count to 130.
Benchmarking Fast ICA/GFunc_Cube/10000: Collecting 200 samples in estimated 11.087 s (20k itFast ICA/GFunc_Cube/10000
                        time:   [544.05 µs 564.51 µs 588.85 µs]
                        change: [-21.221% -18.213% -15.100%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 27 outliers among 200 measurements (13.50%)
  12 (6.00%) high mild
  15 (7.50%) high severe
Benchmarking Fast ICA/GFunc_Cube/100000: Collecting 200 samples in estimated 11.160 s (1000 Fast ICA/GFunc_Cube/100000
                        time:   [10.846 ms 11.265 ms 11.692 ms]
                        change: [-18.176% -13.527% -8.7968%] (p = 0.00 < 0.02)
                        Performance has improved.

Benchmarking Fast ICA/GFunc_Logcosh/1000: Collecting 200 samples in estimated 10.117 s (141kFast ICA/GFunc_Logcosh/1000
                        time:   [71.111 µs 73.378 µs 76.084 µs]
                        change: [-25.449% -21.126% -16.787%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 31 outliers among 200 measurements (15.50%)
  10 (5.00%) high mild
  21 (10.50%) high severe
Benchmarking Fast ICA/GFunc_Logcosh/10000: Warming up for 10.000 s
Warning: Unable to complete 200 samples in 10.0s. You may wish to increase target time to 15.8s, enable flat sampling, or reduce sample count to 110.
Benchmarking Fast ICA/GFunc_Logcosh/10000: Collecting 200 samples in estimated 15.837 s (20kFast ICA/GFunc_Logcosh/10000
                        time:   [714.59 µs 739.90 µs 770.62 µs]
                        change: [-21.415% -17.674% -13.566%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 26 outliers among 200 measurements (13.00%)
  13 (6.50%) high mild
  13 (6.50%) high severe
Benchmarking Fast ICA/GFunc_Logcosh/100000: Collecting 200 samples in estimated 10.484 s (10Fast ICA/GFunc_Logcosh/100000
                        time:   [10.322 ms 10.499 ms 10.701 ms]
                        change: [-9.4825% -7.0584% -4.5651%] (p = 0.00 < 0.02)
                        Change within noise threshold.
Found 21 outliers among 200 measurements (10.50%)
  10 (5.00%) high mild
  11 (5.50%) high severe

Benchmarking Fast ICA/Exp/1000: Collecting 200 samples in estimated 11.258 s (141k iterationFast ICA/Exp/1000       time:   [81.798 µs 84.377 µs 87.337 µs]
                        change: [-27.709% -24.513% -21.232%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 34 outliers among 200 measurements (17.00%)
.7s, enable flat sampling, or reduce sample count to 110.
Benchmarking Fast ICA/Exp/10000: Collecting 200 samples in estimated 15.686 s (20k iterationFast ICA/Exp/10000      time:   [747.95 µs 775.90 µs 811.36 µs]
                        change: [-30.952% -27.691% -24.440%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 18 outliers among 200 measurements (9.00%)
  7 (3.50%) high mild
  11 (5.50%) high severe
Benchmarking Fast ICA/Exp/100000: Collecting 200 samples in estimated 10.445 s (600 iteratioFast ICA/Exp/100000     time:   [16.089 ms 16.508 ms 16.962 ms]
                        change: [-22.663% -18.994% -15.249%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 19 outliers among 200 measurements (9.50%)
  4 (2.00%) high mild
  15 (7.50%) high severe

running 0 tests

Linfa-linear


Benchmarking Linfa_linear/OLS-5Feats/1000: Collecting 200 samples in estimated 10.260 s (683Linfa_linear/OLS-5Feats/1000
                        time:   [16.100 µs 17.669 µs 19.373 µs]
                        change: [+1.4135% +9.4220% +18.296%] (p = 0.01 < 0.02)
                        Change within noise threshold.
Found 35 outliers among 200 measurements (17.50%)
  13 (6.50%) high mild
  22 (11.00%) high severe
Benchmarking Linfa_linear/GLM-5Feats/1000: Collecting 200 samples in estimated 10.588 s (40kLinfa_linear/GLM-5Feats/1000
                        time:   [264.29 µs 269.09 µs 274.75 µs]
                        change: [-32.986% -30.777% -28.649%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 17 outliers among 200 measurements (8.50%)
  4 (2.00%) high mild
  13 (6.50%) high severe
Benchmarking Linfa_linear/OLS-5Feats/10000: Collecting 200 samples in estimated 12.250 s (60Linfa_linear/OLS-5Feats/10000
                        time:   [181.90 µs 189.21 µs 197.10 µs]
                        change: [-7.4067% -1.4866% +4.9617%] (p = 0.61 > 0.02)
                        No change in performance detected.
Found 16 outliers among 200 measurements (8.00%)
  13 (6.50%) high mild
  3 (1.50%) high severe
Benchmarking Linfa_linear/GLM-5Feats/10000: Collecting 200 samples in estimated 10.137 s (32Linfa_linear/GLM-5Feats/10000
                        time:   [2.5267 ms 2.6327 ms 2.7492 ms]
                        change: [-23.462% -18.803% -13.714%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 30 outliers among 200 measurements (15.00%)
  7 (3.50%) high mild
  23 (11.50%) high severe
Benchmarking Linfa_linear/OLS-5Feats/100000: Collecting 200 samples in estimated 10.242 s (2Linfa_linear/OLS-5Feats/100000
                        time:   [3.5774 ms 3.6597 ms 3.7516 ms]
                        change: [-1.8761% +1.0010% +4.1621%] (p = 0.45 > 0.02)
                        No change in performance detected.
Found 24 outliers among 200 measurements (12.00%)
  7 (3.50%) high mild
  17 (8.50%) high severe
Benchmarking Linfa_linear/GLM-5Feats/100000: Collecting 200 samples in estimated 12.127 s (4Linfa_linear/GLM-5Feats/100000
                        time:   [30.049 ms 30.786 ms 31.600 ms]
                        change: [-28.585% -26.179% -23.484%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 26 outliers among 200 measurements (13.00%)
  6 (3.00%) high mild
  20 (10.00%) high severe
Benchmarking Linfa_linear/OLS-10Feats/100000: Collecting 200 samples in estimated 10.440 s (Linfa_linear/OLS-10Feats/100000
                        time:   [9.9221 ms 10.092 ms 10.279 ms]
                        change: [-15.483% -11.905% -8.0618%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 13 outliers among 200 measurements (6.50%)
  7 (3.50%) high mild
  6 (3.00%) high severe
Benchmarking Linfa_linear/GLM-10Feats/100000: Collecting 200 samples in estimated 17.521 s (Linfa_linear/GLM-10Feats/100000
                        time:   [42.114 ms 43.135 ms 44.247 ms]
                        change: [-42.517% -40.842% -39.047%] (p = 0.00 < 0.02)
                        Performance has improved.
Found 19 outliers among 200 measurements (9.50%)
  12 (6.00%) high mild
  7 (3.50%) high severe

Linfa-pls

Benchmarking Linfa_pls/Regression-Nipals-5Feats/1000: Collecting 200 samples in estimated 14Linfa_pls/Regression-Nipals-5Feats/1000
                        time:   [332.40 µs 339.32 µs 347.45 µs]
                        change: [-9.3788% -3.3968% +2.1272%] (p = 0.23 > 0.05)
                        No change in performance detected.
Found 32 outliers among 200 measurements (16.00%)
  6 (3.00%) high mild
  26 (13.00%) high severe
Benchmarking Linfa_pls/Canonical-Nipals-5Feats/1000: Collecting 200 samples in estimated 10.Linfa_pls/Canonical-Nipals-5Feats/1000
                        time:   [10.733 ns 10.837 ns 10.952 ns]
                        change: [-3.2810% -1.1013% +1.1901%] (p = 0.28 > 0.05)
                        No change in performance detected.
Found 11 outliers among 200 measurements (5.50%)
  9 (4.50%) high mild
  2 (1.00%) high severe
Benchmarking Linfa_pls/Cca-Nipals-5Feats/1000: Collecting 200 samples in estimated 10.000 s Linfa_pls/Cca-Nipals-5Feats/1000
                        time:   [11.561 ns 11.974 ns 12.400 ns]
                        change: [+2.5840% +5.3203% +7.9650%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 200 measurements (10.00%)
  20 (10.00%) high mild
Benchmarking Linfa_pls/Regression-Nipals-5Feats/10000: Collecting 200 samples in estimated 1Linfa_pls/Regression-Nipals-5Feats/10000
                        time:   [3.4579 ms 3.5785 ms 3.7099 ms]
                        change: [-8.1025% -3.6560% +0.9395%] (p = 0.09 > 0.05)
                        No change in performance detected.
Found 33 outliers among 200 measurements (16.50%)
  9 (4.50%) high mild
  24 (12.00%) high severe
Benchmarking Linfa_pls/Canonical-Nipals-5Feats/10000: Collecting 200 samples in estimated 10Linfa_pls/Canonical-Nipals-5Feats/10000
                        time:   [10.836 ns 11.059 ns 11.316 ns]
                        change: [-5.2675% +2.0760% +7.2764%] (p = 0.56 > 0.05)
                        No change in performance detected.
Found 31 outliers among 200 measurements (15.50%)
  7 (3.50%) high mild
  24 (12.00%) high severe
Benchmarking Linfa_pls/Cca-Nipals-5Feats/10000: Collecting 200 samples in estimated 10.000 sLinfa_pls/Cca-Nipals-5Feats/10000
                        time:   [10.866 ns 11.006 ns 11.169 ns]
                        change: [+0.6067% +4.6806% +11.108%] (p = 0.04 < 0.05)
                        Change within noise threshold.
Found 14 outliers among 200 measurements (7.00%)
  9 (4.50%) high mild
  5 (2.50%) high severe
Benchmarking Linfa_pls/Regression-Nipals-5Feats/100000: Collecting 200 samples in estimated Linfa_pls/Regression-Nipals-5Feats/100000
                        time:   [36.568 ms 37.228 ms 37.971 ms]
                        change: [-11.704% -9.0450% -6.1640%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 23 outliers among 200 measurements (11.50%)
  13 (6.50%) high mild
  10 (5.00%) high severe
Benchmarking Linfa_pls/Canonical-Nipals-5Feats/100000: Collecting 200 samples in estimated 1Linfa_pls/Canonical-Nipals-5Feats/100000
                        time:   [10.766 ns 10.957 ns 11.190 ns]
                        change: [+1.1430% +3.1624% +5.2441%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 18 outliers among 200 measurements (9.00%)
  6 (3.00%) high mild
  12 (6.00%) high severe
Benchmarking Linfa_pls/Cca-Nipals-5Feats/100000: Collecting 200 samples in estimated 10.000 Linfa_pls/Cca-Nipals-5Feats/100000
                        time:   [10.813 ns 10.965 ns 11.136 ns]
                        change: [+1.3988% +2.8733% +4.5508%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 12 outliers among 200 measurements (6.00%)
  4 (2.00%) high mild
  8 (4.00%) high severe
Benchmarking Linfa_pls/Regression-Nipals-10Feats/100000: Collecting 200 samples in estimatedLinfa_pls/Regression-Nipals-10Feats/100000
                        time:   [48.517 ms 49.261 ms 50.071 ms]
                        change: [-9.5216% -7.2586% -4.9307%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 21 outliers among 200 measurements (10.50%)
  14 (7.00%) high mild
  7 (3.50%) high severe
Benchmarking Linfa_pls/Canonical-Nipals-10Feats/100000: Collecting 200 samples in estimated Linfa_pls/Canonical-Nipals-10Feats/100000
                        time:   [12.367 ns 12.652 ns 12.958 ns]
                        change: [+15.737% +18.169% +20.209%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 200 measurements (6.00%)
  10 (5.00%) high mild
  2 (1.00%) high severe
Benchmarking Linfa_pls/Cca-Nipals-10Feats/100000: Collecting 200 samples in estimated 10.000Linfa_pls/Cca-Nipals-10Feats/100000
                        time:   [11.463 ns 11.820 ns 12.201 ns]
                        change: [+1.9071% +4.3553% +7.1001%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 15 outliers among 200 measurements (7.50%)
  15 (7.50%) high mild
Benchmarking Linfa_pls/Regression-Svd-5Feats/1000: Collecting 200 samples in estimated 13.55Linfa_pls/Regression-Svd-5Feats/1000
                        time:   [316.38 µs 325.86 µs 336.43 µs]
                        change: [-9.2796% -5.3762% -1.6286%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 24 outliers among 200 measurements (12.00%)
  6 (3.00%) high mild
  18 (9.00%) high severe
Benchmarking Linfa_pls/Canonical-Svd-5Feats/1000: Collecting 200 samples in estimated 10.000Linfa_pls/Canonical-Svd-5Feats/1000
                        time:   [10.764 ns 10.901 ns 11.069 ns]
                        change: [-0.1394% +2.1162% +4.6039%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 16 outliers among 200 measurements (8.00%)
  5 (2.50%) high mild
  11 (5.50%) high severe
Benchmarking Linfa_pls/Cca-Svd-5Feats/1000: Collecting 200 samples in estimated 10.000 s (96Linfa_pls/Cca-Svd-5Feats/1000
                        time:   [10.856 ns 11.018 ns 11.195 ns]
                        change: [+2.4001% +4.8511% +7.6410%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 27 outliers among 200 measurements (13.50%)
  10 (5.00%) high mild
  17 (8.50%) high severe
Benchmarking Linfa_pls/Regression-Svd-5Feats/10000: Collecting 200 samples in estimated 10.6Linfa_pls/Regression-Svd-5Feats/10000
                        time:   [3.2120 ms 3.3163 ms 3.4310 ms]
                        change: [+0.4988% +4.2011% +7.8310%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 36 outliers among 200 measurements (18.00%)
  11 (5.50%) high mild
  25 (12.50%) high severe
Benchmarking Linfa_pls/Canonical-Svd-5Feats/10000: Collecting 200 samples in estimated 10.00Linfa_pls/Canonical-Svd-5Feats/10000
                        time:   [11.028 ns 11.299 ns 11.604 ns]
                        change: [-2.3799% -0.1380% +2.2186%] (p = 0.90 > 0.05)
                        No change in performance detected.
Found 18 outliers among 200 measurements (9.00%)
  11 (5.50%) high mild
  7 (3.50%) high severe
Benchmarking Linfa_pls/Cca-Svd-5Feats/10000: Collecting 200 samples in estimated 10.000 s (9Linfa_pls/Cca-Svd-5Feats/10000
                        time:   [11.117 ns 11.392 ns 11.688 ns]
                        change: [+5.1226% +8.1209% +11.584%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 23 outliers among 200 measurements (11.50%)
  12 (6.00%) high mild
  11 (5.50%) high severe
Benchmarking Linfa_pls/Regression-Svd-5Feats/100000: Collecting 200 samples in estimated 14.Linfa_pls/Regression-Svd-5Feats/100000
                        time:   [34.748 ms 35.355 ms 36.050 ms]
                        change: [-16.349% -13.904% -11.395%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 200 measurements (9.50%)
  6 (3.00%) high mild
  13 (6.50%) high severe
Benchmarking Linfa_pls/Canonical-Svd-5Feats/100000: Collecting 200 samples in estimated 10.0Linfa_pls/Canonical-Svd-5Feats/100000
                        time:   [11.100 ns 11.332 ns 11.591 ns]
                        change: [+3.8935% +6.7934% +9.5674%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 23 outliers among 200 measurements (11.50%)
  22 (11.00%) high mild
  1 (0.50%) high severe
Benchmarking Linfa_pls/Cca-Svd-5Feats/100000: Collecting 200 samples in estimated 10.000 s (Linfa_pls/Cca-Svd-5Feats/100000
                        time:   [10.868 ns 11.004 ns 11.155 ns]
                        change: [+1.0660% +3.6581% +6.4764%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 20 outliers among 200 measurements (10.00%)
  4 (2.00%) high mild
  16 (8.00%) high severe
Benchmarking Linfa_pls/Regression-Svd-10Feats/100000: Collecting 200 samples in estimated 19Linfa_pls/Regression-Svd-10Feats/100000
                        time:   [46.033 ms 47.427 ms 48.927 ms]
                        change: [-22.650% -19.176% -15.678%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 200 measurements (6.50%)
  13 (6.50%) high mild
Benchmarking Linfa_pls/Canonical-Svd-10Feats/100000: Collecting 200 samples in estimated 10.Linfa_pls/Canonical-Svd-10Feats/100000
                        time:   [10.891 ns 11.065 ns 11.271 ns]
                        change: [-21.951% -19.650% -17.248%] (p = 0.00 < 0.05)
                        Performance has improved.
Benchmarking Linfa_pls/Cca-Svd-10Feats/100000: Collecting 200 samples in estimated 10.000 s Linfa_pls/Cca-Svd-10Feats/100000
                        time:   [11.073 ns 11.306 ns 11.570 ns]
                        change: [-3.0381% -0.1359% +2.8312%] (p = 0.91 > 0.05)
                        No change in performance detected.
Found 25 outliers among 200 measurements (12.50%)
  7 (3.50%) high mild
  18 (9.00%) high severe

rust-ml / linfa