natrys / whisper.el

Speech-to-Text interface for Emacs using OpenAI's whisper model and whisper.cpp as inference engine.
140 stars 10 forks source link

faster-whisper? #11

Closed ekg closed 10 months ago

ekg commented 1 year ago

Could we use https://github.com/guillaumekln/faster-whisper?

natrys commented 1 year ago

This looks very interesting, specially as it seems to perform much better than whisper.cpp even in CPU benchmark. Thanks for bringing this to my attention.

Although I am a little miffed by the fact that they don't have a simple CLI interface. It's more like a python library, which you use from your python code. Not a show stopper I guess, we could provide our own CLI entry script.

Another issue to consider is that using whisper.cpp allows us to automate installation, model download etc. Again, not a show stopper, pip install --user can lead to weird issues but we could use our own venv. As for model download, we will have to provide some more plumbing code.

doctorguile commented 10 months ago

The speed increase seems very promising. BTW, if you scroll further down the page,

https://github.com/Purfview/whisper-standalone-win

Standalone executables of OpenAI's Whisper & Faster-Whisper for those who don't want to bother with Python.

Faster-Whisper executables are compatible with Windows 7 x64, Linux v5.4, Mac OS X v10.15 and above. Meant to be used in command-line interface or Subtitle Edit.

whisper-ctranslate2 is a command line client based on faster-whisper and compatible with the original client from openai/whisper.

so there's already a CLI interface for that it seems. Not sure if that CLI wrapper also will make it easier for installation and model download.

sasinhe commented 10 months ago

Is there any updates on this improvement? I would like to contribute if possible

OrionRandD commented 10 months ago

This looks very interesting, specially as it seems to perform much better than whisper.cpp even in CPU benchmark. Thanks for bringing this to my attention.

Although I am a little miffed by the fact that they don't have a simple CLI interface. It's more like a python library, which you use from your python code. Not a show stopper I guess, we could provide our own CLI entry script.

Another issue to consider is that using whisper.cpp allows us to automate installation, model download etc. Again, not a show stopper, pip install --user can lead to weird issues but we could use our own venv. As for model download, we will have to provide some more plumbing code.

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

Here is another one: https://github.com/huggingface/distil-whisper Supposed to be very fast...

natrys commented 10 months ago

@sasinhe Actually using any of these tools mentioned here and verifying the benefits would be a start.

FWIW, here is how my attempt at installing whisper-ctranslate2/faster-whisper went today, doing pip install on a fresh venv of my system python (3.12 and linux):

I think, there is very little chance I am going to subject myself to long term exposure to the insanity that is Python package management. When you compare this to the fact that you can download random elisp code from 20 years ago and it still runs in current Emacs, it's mind boggling.

But finally I can do benchmark. I chose a random video from Prot's youtube page. It's 29 minutes long and here is how long they respectively took:

whisper-ctranslate

time whisper-ctranslate2 --model base.en --device cpu --threads 8 --language en --verbose False --output_format txt /tmp/emacs-whisper.wav

1m36.97s real 10m58.08s user 0m49.69s system

whisper.cpp

time ./main -t 8 --language en --model models/ggml-base.en.bin --no-timestamps --file /tmp/emacs-whisper.wav > output

1m03.15s real 7m55.90s user 0m02.06s system

So compared to whisper.cpp today, this is actually a regression in the most common scenario (CPU and smaller model). Now, maybe in other scenarios it's a win, but I would need to see those data compared against up to date version of whisper.cpp to be fully convinced.

And even if it makes sense to support those, it's clear that I couldn't take responsibility of installing these python tools in user's computer like I do with whisper.cpp now. So it will be like, bring your own install and models, but then what we do here is barely a step removed from running a shell command in the background, so should anybody even be using whisper.el?

doctorguile commented 10 months ago

I believe faster-whisper defaults to beam-size = 5, so we need to set whisper.cpp with --bs=5 --bo=5 for comparison.

Actually there is an issue tracker opened on whisper.cpp to discuss how to catch up with faster-whisper's speed, and there is already a plan for improvement.

The bottom line is that unless you have a powerful Nvidia GPU that you want to take advantage of, whisper.cpp is plenty fast for dictation using CPU alone. Also I agree with python dependency hell (how come this is still an issue in 2023 with so much $$$ behind it), user would have to bring their own install to use because you'd have to install nvidia driver and cudannn library as well (faster-whisper doesn't take care of that).

natrys commented 10 months ago

Actually there is an issue tracker opened on whisper.cpp to discuss how to catch up with faster-whisper's speed, and there is already a plan for improvement.

That's good to know, and would save everyone quite a lot of headache. The faster-whisper repo has benchmark using whisper.cpp from a February commit, I think the latter has evolved a lot since, and hopefully will continue to in future.

There are ways to make whisper.cpp even faster by choosing appropriate acceleration framework (on even GPU). I will make make a reminder in readme for advanced users to recompile their whisper.cpp with appropriate acceleration framework support like coreml, cublas, clblast, openblas etc. (though for me personally openblas cpu acceleration did poorly compared to default).

There is also option to use quantized version of a bigger model which sacrifices some accuracy (though hopefully still more accurate than smaller model) to gain performance.

user would have to bring their own install to use because you'd have to install nvidia driver and cudannn library as well (faster-whisper doesn't take care of that)

Under that circumstance, I am more than amenable to adding support for something like whisper-ctranslate2 in its minimal form, but perhaps it would be better to document how to override the inference function entirely to plug whatever other program someone might want to use as well.

natrys commented 10 months ago

The wiki now contains a recipe showing how to use whisper-ctranslate2 (which uses faster-whisper). I am therefore closing this issue, but we can raise any residual concerns here.

pedro-nonfree commented 9 months ago

Actually there is an issue tracker opened on whisper.cpp to discuss how to catch up with faster-whisper's speed, and there is already a plan for improvement.

I just wanted to put here the issues to facilitate its tracking