Vision language model support

huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.

Apache License 2.0

260 stars 48 forks source link

Vision language model support #295

Open merveenoyan opened 3 days ago

merveenoyan commented 3 days ago

Hello! 💗 When trying to run benchmarks on vision language models (image-text-to-text) I realized this library doesn't support this task. It would be nice to have a support for it since these models are almost as mainstream as LLMs.

IlyasMoutawwakil commented 3 days ago

Hi, done in https://github.com/huggingface/optimum-benchmark/pull/296 🤗 I remember giving you my word so had to do it 🔥 I also added a small config for testing so that we can iterate and test it.