🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
I want to do some benchmarks on GGUF models in terms of energy, latency, and memory using LlamaCpp backend, how to get started, any example code or instruction?
It works, thank you! If I can checkin my llama_cpp_text_generation.py into examples folder as an example python code to run LlamaCpp backend benchmarks on energy, latency, and memory
I want to do some benchmarks on GGUF models in terms of energy, latency, and memory using LlamaCpp backend, how to get started, any example code or instruction?