mkiol / dsnote

Speech Note Linux app. Note taking, reading and translating with offline Speech to Text, Text to Speech and Machine translation.
Mozilla Public License 2.0
579 stars 20 forks source link

Add support of FasterWhisper #22

Closed Breizhux closed 1 year ago

Breizhux commented 1 year ago

Hello,

I really appreciate your project! I think it's going in a very nice and useful direction!

I note that you support the Coqui STT, Vosk and whisper.cpp engines. Would it be possible to add guillaumekln's fasterwhisper STT engine? (Here)

FasterWhisper has the advantage of being incredibly faster than whisper.cpp, while consuming relatively little extra RAM (the differences are shown in a table on its github). So I think it would be a great idea! The models have, if I've understood correctly, been modified but are available on HuggingFace (again, everything is very well indicated on its github).

Thanks in advance! Good luck with the rest of the project ;)

Breizhux

mkiol commented 1 year ago

Thank you for the idea and sorry for the late reply (I'm on vacations 🍹).

I tested Faster Whisper and indeed it is quicker. Using my own benchmark, it is 1/3 faster on CPU comparing to whisper.cpp.

Adding to the roadmap:

Breizhux commented 1 year ago

Thanks for this great news! Your app will soon be my favorite 😉️ Happy vacations!

mkiol commented 1 year ago

Implemented in 965dc2026caf1420935ffa1bdabbc8bbd339efa1.

This change is included in "beta" version. You can install and test "beta" version from flathub-beta channel.

Sadly, in Flatpak only CPU inference works. To enable CUDA, I would have to pack NVIDIA's cudnn lib. This lib is extremely huge, the unpacked size is ~2GB.