Distil whisper and other models beyond the existing static list

rhasspy / wyoming-faster-whisper

Wyoming protocol server for faster whisper speech to text system

MIT License

91 stars 33 forks source link

Distil whisper and other models beyond the existing static list #23

Closed khalob closed 8 months ago

khalob commented 9 months ago

@synesthesiam I appreciate all the work you and the HA team are doing. Was curious though, if you had any thoughts on the following:

It may be a stupid question / I might not fully understanding, but would this project see improvements if https://github.com/huggingface/distil-whisper models were used?
- Per the faster-whisper github, it is 4 times faster than openai/whisper for the same accuracy while using less memory.
- Per the distil-whisper github, it is 6 times faster, 49% smaller, and performs within 1% word error rate (WER). Now does that mean these savings could be compounding? No idea.
Currently, as you mentioned elsewhere, this project is pulling models from a static list that corresponds to some GitHub downloadable models that you're supplying. Would you be opposed to a PR allowing local (already downloaded) models?

Thanks again :)

khalob commented 9 months ago

Just realizing this same question was asked previously. Woops. https://github.com/rhasspy/wyoming-faster-whisper/issues/10 for context. There is a forked repo for GPU and distil model usage. Would really love to have it implemented natively though

ChiefJReloaded commented 8 months ago

I second this. It would be great to have more models, especially because there are a lot of them already on Huggingface. So I see two major ways this could be accomplished:

Extend the static list with models, that are either created from conversions, etc. or are directly taken from Huggingface.
Use Huggingface as the sole point to pull models from and push models, that are not in some form on Huggingface already, to Huggingface (e.g. the -int8 variants, which I didn't find there).

If there is help required, I would be glad to offer.

synesthesiam commented 8 months ago

This has been done in the 2.0.0 release: https://github.com/rhasspy/wyoming-faster-whisper/releases/tag/v2.0.0 I uploaded the int8 variants to HuggingFace, and the --model argument can now be anything that WhisperModel supports, such as tiny.en, tiny, base.en, base, small.en, small, medium.en, medium, large-v1, large-v2, large-v3, large, distil-large-v2, distil-medium.en, distil-small.en, or a HuggingFace model ID.

khalob commented 8 months ago

@synesthesiam you're the best. Thank you!

ChiefJReloaded commented 8 months ago

@synesthesiam Very nice, thank you!