huggingface / optimum-benchmark

๐Ÿ‹๏ธ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Apache License 2.0
255 stars 48 forks source link
benchmark neural-compressor onnxruntime openvino pytorch tensorrt-llm text-generation-inference

Optimum-Benchmark Logo

All benchmarks are wrong, some will cost you less than others.

Optimum-Benchmark ๐Ÿ‹๏ธ

PyPI - Python Version PyPI - Version PyPI - Downloads PyPI - Implementation PyPI - Format PyPI - License

Optimum-Benchmark is a unified multi-backend & multi-device utility for benchmarking Transformers, Diffusers, PEFT, TIMM and Optimum libraries, along with all their supported optimizations & quantization schemes, for inference & training, in distributed & non-distributed settings, in the most correct, efficient and scalable way possible.

News ๐Ÿ“ฐ

Motivations ๐ŸŽฏ

 

[!Note] Optimum-Benchmark is a work in progress and is not yet ready for production use, but we're working hard to make it so. Please keep an eye on the project and help us improve it and make it more useful for the community. We're looking forward to your feedback and contributions. ๐Ÿš€  

CI Status ๐Ÿšฆ

Optimum-Benchmark is continuously and intensively tested on a variety of devices, backends, scenarios and launchers to ensure its stability with over 300 tests running on every PR (you can request more tests if you want to).

API ๐Ÿ“ˆ

API_CPU API_CUDA API_MISC API_ROCM

CLI ๐Ÿ“ˆ

CLI_CPU_IPEX CLI_CPU_LLAMA_CPP CLI_CPU_NEURAL_COMPRESSOR CLI_CPU_ONNXRUNTIME CLI_CPU_OPENVINO CLI_CPU_PYTORCH CLI_CPU_PY_TXI CLI_CUDA_ONNXRUNTIME CLI_CUDA_PYTORCH CLI_CUDA_PY_TXI CLI_CUDA_TENSORRT_LLM CLI_CUDA_TORCH_ORT CLI_CUDA_VLLM CLI_MISC CLI_ROCM_PYTORCH

Quickstart ๐Ÿš€

Installation ๐Ÿ“ฅ

You can install the latest released version of optimum-benchmark on PyPI:

pip install optimum-benchmark

or you can install the latest version from the main branch on GitHub:

pip install git+https://github.com/huggingface/optimum-benchmark.git

or if you want to tinker with the code, you can clone the repository and install it in editable mode:

git clone https://github.com/huggingface/optimum-benchmark.git
cd optimum-benchmark
pip install -e .
Advanced install options Depending on the backends you want to use, you can install `optimum-benchmark` with the following extras: - PyTorch (default): `pip install optimum-benchmark` - OpenVINO: `pip install optimum-benchmark[openvino]` - Torch-ORT: `pip install optimum-benchmark[torch-ort]` - OnnxRuntime: `pip install optimum-benchmark[onnxruntime]` - TensorRT-LLM: `pip install optimum-benchmark[tensorrt-llm]` - OnnxRuntime-GPU: `pip install optimum-benchmark[onnxruntime-gpu]` - Neural Compressor: `pip install optimum-benchmark[neural-compressor]` - Py-TXI: `pip install optimum-benchmark[py-txi]` - IPEX: `pip install optimum-benchmark[ipex]` - vLLM: `pip install optimum-benchmark[vllm]` We also support the following extra extra dependencies: - autoawq - auto-gptq - sentence-transformers - bitsandbytes - codecarbon - flash-attn - deepspeed - diffusers - timm - peft

Running benchmarks using the Python API ๐Ÿงช

You can run benchmarks from the Python API, using the Benchmark class and its launch method. It takes a BenchmarkConfig object as input, runs the benchmark in an isolated process and returns a BenchmarkReport object containing the benchmark results.

Here's an example of how to run an isolated benchmark using the pytorch backend, torchrun launcher and inference scenario with latency and memory tracking enabled.

from optimum_benchmark import Benchmark, BenchmarkConfig, TorchrunConfig, InferenceConfig, PyTorchConfig
from optimum_benchmark.logging_utils import setup_logging

setup_logging(level="INFO", handlers=["console"])

if __name__ == "__main__":
    launcher_config = TorchrunConfig(nproc_per_node=2)
    scenario_config = InferenceConfig(latency=True, memory=True)
    backend_config = PyTorchConfig(model="gpt2", device="cuda", device_ids="0,1", no_weights=True)
    benchmark_config = BenchmarkConfig(
        name="pytorch_gpt2",
        scenario=scenario_config,
        launcher=launcher_config,
        backend=backend_config,
    )
    benchmark_report = Benchmark.launch(benchmark_config)

    # log the benchmark in terminal
    benchmark_report.log() # or print(benchmark_report)

    # convert artifacts to a dictionary or dataframe
    benchmark_config.to_dict() # or benchmark_config.to_dataframe()

    # save artifacts to disk as json or csv files
    benchmark_report.save_csv("benchmark_report.csv") # or benchmark_report.save_json("benchmark_report.json")

    # push artifacts to the hub
    benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2") # or benchmark_config.push_to_hub("IlyasMoutawwakil/pytorch_gpt2")

    # or merge them into a single artifact
    benchmark = Benchmark(config=benchmark_config, report=benchmark_report)
    benchmark.save_json("benchmark.json") # or benchmark.save_csv("benchmark.csv")
    benchmark.push_to_hub("IlyasMoutawwakil/pytorch_gpt2")

    # load artifacts from the hub
    benchmark = Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2") # or Benchmark.from_hub("IlyasMoutawwakil/pytorch_gpt2")

    # or load them from disk
    benchmark = Benchmark.load_json("benchmark.json") # or Benchmark.load_csv("benchmark_report.csv")

If you're on VSCode, you can hover over the configuration classes to see the available parameters and their descriptions. You can also see the available parameters in the Features section below.

Running benchmarks using the Hydra CLI ๐Ÿงช

You can also run a benchmark using the command line by specifying the configuration directory and the configuration name. Both arguments are mandatory for hydra. --config-dir is the directory where the configuration files are stored and --config-name is the name of the configuration file without its .yaml extension.

optimum-benchmark --config-dir examples/ --config-name pytorch_bert

This will run the benchmark using the configuration in examples/pytorch_bert.yaml and store the results in runs/pytorch_bert.

The resulting files are :

Advanced CLI options #### Configuration overrides ๐ŸŽ›๏ธ It's easy to override the default behavior of a benchmark from the command line of an already existing configuration file. For example, to run the same benchmark on a different device, you can use the following command: ```bash optimum-benchmark --config-dir examples/ --config-name pytorch_bert backend.model=gpt2 backend.device=cuda ``` #### Configuration sweeps ๐Ÿงน You can easily run configuration sweeps using the `--multirun` option. By default, configurations will be executed serially but other kinds of executions are supported with hydra's launcher plugins (e.g. `hydra/launcher=joblib`). ```bash optimum-benchmark --config-dir examples --config-name pytorch_bert -m backend.device=cpu,cuda ``` ### Configurations structure ๐Ÿ“ You can create custom and more complex configuration files following these [examples]([examples](https://github.com/IlyasMoutawwakil/optimum-benchmark-examples)). They are heavily commented to help you understand the structure of the configuration files.

Features ๐ŸŽจ

optimum-benchmark allows you to run benchmarks with minimal configuration. A benchmark is defined by three main components:

Launchers ๐Ÿš€

General Launcher features ๐Ÿงฐ - [x] Assert GPU devices (NVIDIA & AMD) isolation (`launcher.device_isolation=true`). This feature makes sure no other processes are running on the targeted GPU devices other than the benchmark. Espepecially useful when running benchmarks on shared resources.

Scenarios ๐Ÿ‹

Inference scenario features ๐Ÿงฐ - [x] Memory tracking (`scenario.memory=true`) - [x] Energy and efficiency tracking (`scenario.energy=true`) - [x] Latency and throughput tracking (`scenario.latency=true`) - [x] Warm up runs before inference (`scenario.warmup_runs=20`) - [x] Inputs shapes control (e.g. `scenario.input_shapes.sequence_length=128`) - [x] Forward, Call and Generate kwargs (e.g. for an LLM `scenario.generate_kwargs.max_new_tokens=100`, for a diffusion model `scenario.call_kwargs.num_images_per_prompt=4`) See [InferenceConfig](optimum_benchmark/scenarios/inference/config.py) for more information.
Training scenario features ๐Ÿงฐ - [x] Memory tracking (`scenario.memory=true`) - [x] Energy and efficiency tracking (`scenario.energy=true`) - [x] Latency and throughput tracking (`scenario.latency=true`) - [x] Warm up steps before training (`scenario.warmup_steps=20`) - [x] Dataset shapes control (e.g. `scenario.dataset_shapes.sequence_length=128`) - [x] Training arguments control (e.g. `scenario.training_args.per_device_train_batch_size=4`) See [TrainingConfig](optimum_benchmark/scenarios/training/config.py) for more information.

Backends & Devices ๐Ÿ“ฑ

General backend features ๐Ÿงฐ - [x] Device selection (`backend.device=cuda`), can be `cpu`, `cuda`, `mps`, etc. - [x] Device ids selection (`backend.device_ids=0,1`), can be a list of device ids to run the benchmark on multiple devices. - [x] Model selection (`backend.model=gpt2`), can be a model id from the HuggingFace model hub or an **absolute path** to a model folder. - [x] "No weights" feature, to benchmark models without downloading their weights, using randomly initialized weights (`backend.no_weights=true`)
Backend specific features ๐Ÿงฐ For more information on the features of each backend, you can check their respective configuration files: - [VLLMConfig](optimum_benchmark/backends/vllm/config.py) - [IPEXConfig](optimum_benchmark/backends/ipex/config.py) - [OVConfig](optimum_benchmark/backends/openvino/config.py) - [PyTXIConfig](optimum_benchmark/backends/py_txi/config.py) - [PyTorchConfig](optimum_benchmark/backends/pytorch/config.py) - [ORTConfig](optimum_benchmark/backends/onnxruntime/config.py) - [TorchORTConfig](optimum_benchmark/backends/torch_ort/config.py) - [LLMSwarmConfig](optimum_benchmark/backends/llm_swarm/config.py) - [TRTLLMConfig](optimum_benchmark/backends/tensorrt_llm/config.py) - [INCConfig](optimum_benchmark/backends/neural_compressor/config.py)

Contributing ๐Ÿค

Contributions are welcome! And we're happy to help you get started. Feel free to open an issue or a pull request. Things that we'd like to see:

To get started, you can check the CONTRIBUTING.md file.