huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.54k stars 280 forks source link

How to use ONNX model? #16

Open H-G-11 opened 11 months ago

H-G-11 commented 11 months ago

Hello there,

I'm interested in using the ONNX model, as I saw that you are providing the weights for it. I tried to use it with optimum library, but didn't manage to make it work. Could someone indicate in which direction I should look into?

Thank you so much for this repository and the work you put into it. It really helps!!

Note:

here is what I tried

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor
import torch
from optimum.onnxruntime import ORTModelForSpeechSeq2Seq

device = "cuda:0" if torch.cuda.is_available() else "cpu"
torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32

model_id = "distil-whisper/distil-large-v2"

model = ORTModelForSpeechSeq2Seq.from_pretrained(
    model_id, torch_dtype=torch_dtype,  encoder_file_name=f"encoder_model.onnx"
)

Here is the error:

RuntimeError: Too many ONNX model files were found in distil-whisper/distil-large-v2, specify which one to load by using the encoder_file_name argument.
csukuangfj commented 11 months ago

@HuguesGallier

I suggest that you give a look at http://github.com/k2-fsa/sherpa-onnx

It supports both distil-whisper and openai-whisper.

For instance, you can find a colab notebook below about how to run distil-whisper onnx models with sherpa-onnx.

Open In Colab


Note that you can use sherpa-onnx on Windows/macOS/Linux and it also supports Android/iOS/Rapsberry Pi, etc.

H-G-11 commented 10 months ago

Thank you so much, I will have a look at it!