m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
BSD 2-Clause "Simplified" License
12.72k stars 1.35k forks source link

Feature Request: Whisper Tensorrt-llm backend support #624

Open yuekaizhang opened 11 months ago

yuekaizhang commented 11 months ago

Hi WhisperX Team, I was wondering if you consider support https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper the tensorrt-llm backend of whisper. I have done several benchmark test using https://huggingface.co/datasets/hf-internal-testing/librispeech_asr_dummy and large-v3 model. Attached the results below:

V100 GPU faster-whisper TRT-LLM
batch size 1 38 secs Decoding Time, 2.74% Word Error Rate 22 secs Decoding Time, 2.40% Word Error Rate
batch size 4 Not supported batch decoding, may try whisperX 15 secs Decoding Time, 2.40% Word Error Rate
shashikg commented 10 months ago

Dropping this link: https://github.com/shashikg/WhisperS2T/releases/tag/v1.3.0 here if anyone else interested in whisper's TensorRT-LLM integration with a speech-to-text pipeline.

@yuekaizhang ^^

haiderasad commented 1 month ago

@shashikg @yuekaizhang any tensorrt llm integration for transcription + diarization