🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Apache License 2.0
255
stars
48
forks
source link
Update vllm backend to support offline and online serving modes #232
Support online and offline serving modes and arbitrary engine args