Open simonsanvil opened 1 year ago
ITT this week OpenAI released their own API endpoint for audio transcription using Whisper, I attempted to include it in the last release of the app, but it doesn't seem to support the file format of Whatsapp audio files yet (ogg), so a preprocessing step of downloading it and converting it to mp3 would have to be added. As I thought this would slow even more the bot when answering audio messages, I decided to leave it out of this last release. I might explore it more in the near future.
Deepgram is also an option. It costs much less, and has Speaker diarisation etc.. (And, they also have Whisper )
Definitely something we could try
Deepgram is also an option. It costs much less, and has Speaker diarisation etc.. (And, they also have Whisper )
To keep with the theme of the repo, It would be better to use OpenAI's Whisper as the default option for voice-message transcriptions instead of using AssemblyAI (which was initially included because this was intended to be submitted to their 2022 Winter Hackathon). We can either serve the model from the app (perhaps in a separate container) or using HuggingFace's inference endpoints