huggingface / open_asr_leaderboard

Apache License 2.0
56 stars 22 forks source link

Slow Faster-Whisper #45

Open Deep-unlearning opened 2 weeks ago

Deep-unlearning commented 2 weeks ago

Hello,

I attempted to run evaluations for Faster-Whisper from https://github.com/huggingface/open_asr_leaderboard/tree/main/ctranslate2

However, I observed that it was significantly slower than the original whispers.

This is what I got for tiny.en: hf-audio-esb-datasets-test-only-sorted_ami_test: WER: 23.5 % RTFx: 52.97

vs WER: 24.24 % RTFx: 214.27

Are there known issues causing slower runtime?

Also, I noticed that the evals for Faster-Whisper is with a batch_size=1, is this intentional ?

Note: I noticed that the transcribe function uses a default beam_size=5. Even after changing it to beam_size=1, it remained slower than the original Whisper.

Thanks !

yuekaizhang commented 1 week ago

@Deep-unlearning FYI, I had a batched faster whisper here https://github.com/yuekaizhang/open_asr_leaderboard/blob/sherpa/tensorrtllm/run_faster_whisper.sh.

yuekaizhang commented 1 week ago

See https://github.com/yuekaizhang/open_asr_leaderboard/blob/sherpa/tensorrtllm/run_faster_whisper_eval.py#L45.

Deep-unlearning commented 1 week ago

@Deep-unlearning FYI, I had a batched faster whisper here https://github.com/yuekaizhang/open_asr_leaderboard/blob/sherpa/tensorrtllm/run_faster_whisper.sh.

I will look into that, thanks !

yuekaizhang commented 1 week ago

However, the faster whisper used a vad internally, you may implement a chunked long form algorithm to do an apple-to-apple comparsion.