huggingface / optimum-benchmark

🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Optimum's hardware optimizations & quantization schemes.
Apache License 2.0
222 stars 39 forks source link

Evaluators for specific tasks #34

Open IlyasMoutawwakil opened 1 year ago

IlyasMoutawwakil commented 1 year ago

@regisss would it make sense to add task specific evaluators. for example with automatic-speech-recognition, as I did it manually when I did whisper's benchmark.

regisss commented 1 year ago

Sure why not! Do you have task-specific perf metrics in mind? What was the one you used for the Whisper benchmark?

IlyasMoutawwakil commented 1 year ago

WER (word error rate), not very universal, but it's the current standard.

regisss commented 1 year ago

Ah yes okay, I thought you were talking about some specific speed metrics. Maybe you can use evaluate for this: https://github.com/huggingface/evaluate

IlyasMoutawwakil commented 12 months ago

coool there's already a list of implemented evaluators, including automatic-speech-recognition. now the question is whether to have use this as a separate benchmark called evaluation or have it as an argument in inference like memory. I think the second makes sense and avoids repeating the same load/optimization/quantization workload.

regisss commented 12 months ago

I agree, the latter seems better from a UX point of view :+1: