[For your information] Run onnx models of distil-whisper with sherpa-onnx

huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

MIT License

3.63k stars 302 forks source link

[For your information] Run onnx models of distil-whisper with sherpa-onnx #22

Open csukuangfj opened 1 year ago

csukuangfj commented 1 year ago

FYI: We have supported exporting distil-whisper via onnx and run it with sherpa-onnx

You can find a colab notebook below for illustration.

sherpa-onnx is implemented in C++ and provides various APIs for different languages, e.g., Python/C#/Go/C/Kotlin/Swift/C#, etc. It supports Windows/Linux/macOS and Android/iOS/Raspberry Pi, etc.

The current medium model is still very large and its RTF is greater than 1 on CPU.

Hope that tiny/base/small models will be available soon.

csukuangfj commented 1 year ago

If you want to run it on GPU, please see the following colab

https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/Run_distil_whisper_with_sherpa_onnx_on_GPU_ipynb.ipynb

egorsmkv commented 1 year ago

If you want to run it on GPU, please see the following colab

https://github.com/k2-fsa/colab/blob/master/sherpa-onnx/Run_distil_whisper_with_sherpa_onnx_on_GPU_ipynb.ipynb

It's strange that GPU has a lower RTF value:

Real time factor (RTF): 27.556 / 28.165 = 0.978

csukuangfj commented 1 year ago

Why is it strange?

Lower RTF -> Faster

egorsmkv commented 1 year ago

Why is it strange?

Lower RTF -> Faster

Sorry, you are right. I misunderstood that.