kalliope-project / kalliope

Kalliope is a framework that will help you to create your own personal assistant.
https://kalliope-project.github.io/
GNU General Public License v3.0
1.71k stars 229 forks source link

Add support for OpenAI Whisper #687

Closed joshuaboniface closed 1 year ago

joshuaboniface commented 1 year ago

Adds basic support for OpenAI Whisper (local) as an STT provider.

This is supported by the upstream SpeechRecognition library, so this is just the standard translation to/from Kalliope.

Several options have been implemented and documented, including the ability to "unformat" the resulting strings if desired.

Also caps the Ansible version below 5 to fix test failures.

Sispheor commented 1 year ago

Tests are failling. I think we should block the max version of Ansible to 5 here and here

joshuaboniface commented 1 year ago

I did notice that about the tests but I wasn't too sure how to handle that: anything you want me to do in this PR?

Sispheor commented 1 year ago

Yes you can try the proposed changes.

joshuaboniface commented 1 year ago

Done, waiting on tests. I also accidentally added a commit for a second feature (FasterWhisper) which I force-pushed away, but that requires an upstream change to speech_recognition (https://github.com/Uberi/speech_recognition/pull/693) before it can be added.

joshuaboniface commented 1 year ago

And looks like that solved it, I suppose we can just keep that commit in here, I've updated the description.

Sispheor commented 1 year ago

Thx. You can rebase your other PR so I can merge it

jaggzh commented 10 months ago

Is this keeping the whisper model loaded [or is it loaded each call]? Mine seems slow even with tiny being used.

jaggzh commented 10 months ago

Oh, this is great, btw! Thanks. :))

joshuaboniface commented 8 months ago

@jaggzh I believe that SpeechRecognition is loading it each time, which is indeed very slow. I ended up abandoning this myself, and using a custom integration with my own https://github.com/joshuaboniface/remote-faster-whisper tool.

jaggzh commented 8 months ago

Your flask secret looks a lot more developed than my flask server :)

https://github.com/jaggzh/whisperpluck

(This project was for someone to use a GUI for transcription though. Server to keep a whisper model loaded.)

On Tue, Oct 31, 2023, 10:27 PM Joshua M. Boniface @.***> wrote:

@jaggzh https://github.com/jaggzh I believe that SpeechRecognition is loading it each time, which is indeed very slow. I ended up abandoning this myself, and using a custom integration with my own https://github.com/joshuaboniface/remote-faster-whisper tool.

— Reply to this email directly, view it on GitHub https://github.com/kalliope-project/kalliope/pull/687#issuecomment-1788432313, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3AV7MO37LKMIVTEGDCR6LYCHMT5AVCNFSM6AAAAAAYZXR72OVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTOOBYGQZTEMZRGM . You are receiving this because you were mentioned.Message ID: @.***>