Add the option to use Whisper instead of Assembly AI to generate audio transcriptions

simonsanvil / openai-whatsapp-chatbot

A chatbot app that uses OpenAI's GPT and DALL-E to reply to incoming messages from WhatsApp and generate images

MIT License

179 stars 50 forks source link

Add the option to use Whisper instead of Assembly AI to generate audio transcriptions #3

Open simonsanvil opened 1 year ago

simonsanvil commented 1 year ago

To keep with the theme of the repo, It would be better to use OpenAI's Whisper as the default option for voice-message transcriptions instead of using AssemblyAI (which was initially included because this was intended to be submitted to their 2022 Winter Hackathon). We can either serve the model from the app (perhaps in a separate container) or using HuggingFace's inference endpoints

simonsanvil commented 1 year ago

ITT this week OpenAI released their own API endpoint for audio transcription using Whisper, I attempted to include it in the last release of the app, but it doesn't seem to support the file format of Whatsapp audio files yet (ogg), so a preprocessing step of downloading it and converting it to mp3 would have to be added. As I thought this would slow even more the bot when answering audio messages, I decided to leave it out of this last release. I might explore it more in the near future.

eladrave commented 1 year ago

Deepgram is also an option. It costs much less, and has Speaker diarisation etc.. (And, they also have Whisper )

simonsanvil commented 1 year ago

Definitely something we could try

Deepgram is also an option. It costs much less, and has Speaker diarisation etc.. (And, they also have Whisper )