any plans for faster whisper integration in onnx+triton?

k2-fsa / sherpa

Speech-to-text server framework with next-gen Kaldi

https://k2-fsa.github.io/sherpa

Apache License 2.0

474 stars 97 forks source link

Open haiderasad opened 5 months ago

yuekaizhang commented 4 months ago

@haiderasad We have no plan to integrate faster whisper. I recommand to try whisper TensorRT-LLM (https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/whisper), which is the current fastest implementation according to https://github.com/shashikg/WhisperS2T?tab=readme-ov-file#benchmark-and-technical-report.

yuekaizhang commented 3 months ago

See #551. @haiderasad