Document Benchmarking Standards

rust-ml / linfa

A Rust machine learning framework.

Apache License 2.0

3.7k stars 243 forks source link

Document Benchmarking Standards #265

Closed oojo12 closed 1 year ago

oojo12 commented 1 year ago

I think it would be helpful to the community if we documented what we wanted for benchmarking assessments. From my experience we ideally want the following:

Test for a variety of sample sizes (1_000, 10_000, 100_000)
Test for a variety of feature dimensions (5, 10)
Test Single and Multi Target (possibly)
Test the various algorithm implementations (E.G for PLS we'd want to test SVD and Nipals for CCA, Canonical, and Regression)
Set random seed for algorithm if applicable to make results reproducible

YuhanLiin commented 1 year ago

1K and 10K make sense. 100K might be too high for some algorithms, so we should test lower values instead (like 20K)
For spatial algorithms where the samples typically represent real coordinates, such as KMeans, 2 and 3 are likely the most common feature counts. For those algorithms we should do something like (3, 8), just to include 3. From my experience we don't usually use a high number of features with ML algorithms, so we don't need to go above 10. The exception is dimensionality-reduction algorithms.
Most algorithms only support single-target. For multi-target algorithm we only need to test the multi-target case (something like 4 targets).
Yeah this makes sense.
Definitely.

oojo12 commented 1 year ago

Is this more or less all? If so I'm thinking of writing a benchmarking.md for future contributors

YuhanLiin commented 1 year ago

There's also:

Use Criterion
In the BenchmarkId, include the value used to parametrize the benchmark. For example, if we're benchmarking with feature counts of 5 and 10, then the ID should include "5" and "10" for those benchmarks.

oojo12 commented 1 year ago

I will say back when I was performing ml related task it wasn't uncommon to have more than 10 features for predictive analysis especially for tree-based algorithms. However, if I'm honest I don't recall if this was before or after dimensionality reduction. I'll go with your guidance for documentation we can always revise later.

YuhanLiin commented 1 year ago

I've read that many classic ML algorithms don't do well with high feature counts (something called the "curse of dimensionality"), which is why dimensionality reduction is needed. I'm not sure about the exact numbers though.