A comparative benchmarking library for scikit-learn's estimators
sklearn_benchmarks is a library to benchmark scikit-learn's estimators against concurrent implementations. It can be used with a Python API or through a command line as described below.
See benchmark results here.
In order to setup the environment, you need to have conda
installed. See instructions here.
To get a local copy up and running follow these simple example steps:
$ git clone -b master --single-branch https://github.com/mbatoul/sklearn_benchmarks
$ cd sklearn_benchmarks
$ conda env create --file bench_environment.yml
$ conda activate sklbench_dev
$ pip install .
The benchmark script consumes the configuration file (config.yml
by default) which contains the specification of the scikit-learn estimators, their hyperparameters and datasets.
- name: sklearn_KNeighborsClassifier_brute_force
estimator: sklearn.neighbors.KNeighborsClassifier
strategy: hp_match
predict_with_onnx: True
parameters:
n_neighbors:
- 1
- 5
- 100
algorithm:
- brute
n_jobs:
- -1
- 1
metrics:
- accuracy_score
datasets:
- generator: sklearn.datasets.make_classification
generator_parameters:
n_classes: 2
n_redundant: 0
n_features: 100
n_samples_train:
- 100_000
n_samples_test:
- 1
- 1000
- generator: sklearn.datasets.make_classification
generator_parameters:
n_classes: 2
n_redundant: 0
n_features: 2
n_samples_train:
- 100_000
n_samples_test:
- 1
- 1000
estimators
list describes an estimator to benchmark.
name
: identifies the estimator, will be used as the name of the benchmark run result fileestimator
: path to the estimatorpredict_with_onnx
: when set to True
, predictions will be made with ONNX also (for estimators respecting the scikit-learn API only)parameters
: parameters grid to explore in the benchmarksmetrics
: the metrics used to compute the scores (function of the sklearn.metrics module)datasets
: list of datasetsoutput_dir
entry in the benchmarking
section of the configuration file or --output-dir
CLI option). Each run result folder is named with a timestamp corresponding to the date and time of the run (e.g. 20220313T171732
).Run benchmarks for all estimators with profiling
python -m sklearn_benchmarks.cli run --profile
Run benchmarks for a pair of estimators with SLURM
python -m sklearn_benchmarks.cli run --estimator sklearn_Ridge --estimator sklearnex_Ridge --slurm slurm_config.yml
Example of SLURM config file:
logs_dir: slurm_logs
parameters:
timeout_min: 70
slurm_partition: normal
slurm_gpus_per_task: 0
slurm_additional_parameters:
hint: nomultithread
slurm_exclusive: True
slurm_cpus_per_task: 40
Run HPO benchmarks only
python -m sklearn_benchmarks.cli run --strategy hpo
Set a custom output directory
python -m sklearn_benchmarks.cli run --output-dir ~/dev/sklbench_results
Generate reports for one run
python -m sklearn_benchmarks.cli report results/local/20220303T132031
The run directory passed as an argument to the report
command must be valid: it must contain a benchmarking
folder with csv files as well as the files env_info.txt
, time_most_recent_run.txt
and versions.txt
.
Generate reports for all local runs
python -m sklearn_benchmarks.cli report results/local
The results/local
folder will be traversed to identify all the valid run directories it contains (see above). Reporting will be generated for each of them. Complex folder trees with multiple levels of nesting are supported.
run
Usage: python -m sklearn_benchmarks.cli run [OPTIONS]
Run benchmarks for estimators specified in the configuration file.
Options:
-r, --output-dir TEXT Path to directory where benchmark results
should be stored.
-p, --profile Activate profiling of functions.
-c, --config TEXT Path to benchmarks configuration file.
-e, --estimator TEXT Select estimator(s) to benchmark from
configuration file. By default, they will all
be run.
-s, --strategy [hp_match|hpo] Select estimators by benchmarking strategies.
-tb, --time-budget INTEGER Custom time budget for HPO benchmarks in
seconds.
-sl, --slurm TEXT Run benchmarks using SLURM with configuration
from the given config file.
-h, --help Show this message and exit.
report
Usage: python -m sklearn_benchmarks.cli report [OPTIONS] [DIR]...
Generate HTML reports.
Options:
-c, --config TEXT Path to benchmarks configuration file.
-h, --help Show this message and exit.