huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Apache License 2.0
247 stars 45 forks source link

LlamaCpp backend benchmark example code #282

Open Davidqian123 opened 2 weeks ago

Davidqian123 commented 2 weeks ago

I want to do some benchmarks on GGUF models in terms of energy, latency, and memory using LlamaCpp backend, how to get started, any example code or instruction?

IlyasMoutawwakil commented 2 weeks ago

right here:

you can run these configs as explained in the readme

Davidqian123 commented 2 weeks ago

It works, thank you! If I can checkin my llama_cpp_text_generation.py into examples folder as an example python code to run LlamaCpp backend benchmarks on energy, latency, and memory