huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Apache License 2.0
255 stars 48 forks source link

Update vllm backend to support offline and online serving modes #232

Closed IlyasMoutawwakil closed 4 months ago

IlyasMoutawwakil commented 4 months ago

Support online and offline serving modes and arbitrary engine args