huggingface / distil-whisper

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
MIT License
3.33k stars 238 forks source link

distil-whisper doesn't work as a drop-in replacement for whisper #18

Closed o-alexandre-felipe closed 7 months ago

o-alexandre-felipe commented 8 months ago

If one of the goals of distil-whisper is to be a drop in replace of whisper models (1) it would be interesting to be able to cast it to an object of type whisper.Whisper(2), so that it could be used with any custom decoder implemented for whisper.

Practical issue

I faced a few problems when trying to use the model in stable_whisper(3).

from transformers import WhisperForConditionalGeneration
import whisper
import stable_whisper
pt = WhisperForConditionalGeneration.from_pretrained('distil-whisper/distil-medium.en')
stable_whisper.modify_model(pt.model)
audio = whisper.load_audio('sample.mp3')
pt.model.transcribe(audio)

The first issue is that it doesn't have the dims and is_multilingual properties.

pt.model.dims = whisper.load_model('large-v2').dims
pt.model.is_multilingual = False

That gives AttributeError: 'BaseModelOutput' object has no attribute 'dtype'

Next I tried to load the state_dict to a whisper model, but it doesn't work either

    Missing key(s) in state_dict: "encoder.positional_embedding", "encoder.blocks.0.attn.query.weight", "encoder.blocks.0.attn.query.bias", "encoder.blocks.0.attn.key.weight", , ...
    Unexpected key(s) in state_dict: "encoder.embed_positions.weight", "encoder.layers.0.self_attn.k_proj.weight", "encoder.layers.0.self_attn.v_proj.weight", ...

In summary

What would it take to cast a the distil model to a whisper.Whisper so that they can be a drop in alternative for a broader set of applications?

sanchit-gandhi commented 8 months ago

Hey @o-alexandre-felipe - I've added instructions to the model card: https://huggingface.co/distil-whisper/distil-large-v2#running-whisper-in-openai-whisper

Let me know if that fixes your issue!

madroidmaq commented 7 months ago

Hey @o-alexandre-felipe - I've added instructions to the model card: https://huggingface.co/distil-whisper/distil-large-v2#running-whisper-in-openai-whisper

Let me know if that fixes your issue!

@sanchit-gandhi It works for me.

sanchit-gandhi commented 7 months ago

Great! Thanks for confirming @madroidmaq. Going to close this one for now - feel free to open a new issue or re-open this if the problem persists @o-alexandre-felipe.

o-alexandre-felipe commented 7 months ago

Hey @o-alexandre-felipe - I've added instructions to the model card: https://huggingface.co/distil-whisper/distil-large-v2#running-whisper-in-openai-whisper

Let me know if that fixes your issue!

The section was renamed https://huggingface.co/distil-whisper/distil-large-v2#running-distil-whisper-in-openai-whisper