open-webui / open-webui

User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
https://openwebui.com
MIT License
45.58k stars 5.57k forks source link

feat: audio transcription playground #1211

Open g4challenge opened 7 months ago

g4challenge commented 7 months ago

Is your feature request related to a problem? Please describe. I find it challenging when I need to manually transcribe audio content. Whether it’s interviews, meetings, or recorded conversations, having an automated audio transcription feature would significantly improve my workflow.

Describe the solution you’d like I would like OpenWebUI to include an audio transcription feature. Ideally, it should accept audio files (such as MP3, WAV, or other common formats) and convert them into accurate text transcripts. The transcripts should be time-stamped and easily accessible within the interface.

Describe alternatives you’ve considered As an alternative, I’ve explored third-party transcription services based on Whisper with UI (https://github.com/chidiwilliams/buzz , or https://github.com/jhj0517/Whisper-WebUI) but they often come with limitations in installation, sharing, privacy concerns, and additional costs and effort. Having an integrated solution within OpenWebUI would streamline the process and enhance the overall user experience.

Additional context Sometimes, I participate in remote interviews or attend virtual meetings where audio recordings are essential. Having an in-built transcription feature would save time and effort, allowing me to focus on the content rather than manual transcription tasks. When finished I would love to have the ability, to input to a LLM with predefined prompts: eg. "use the following transcript to create a short precise summary in bullet point".

arjunkrishna commented 6 months ago

Yes, having audio and video transcription would be a very useful feature.

arjunkrishna commented 6 months ago

https://github.com/the-crypt-keeper/tldw

rexkani commented 2 weeks ago

This is one of the main feature which i was looking for when i installed openwebui..

flefevre commented 2 weeks ago

In scientific research,it will be a very good feature to be able to record a meeting and then summarize it, and keep it in the workspace. Perhaps it should be compatible with milvus to store the audio and the notes?

I have used https://github.com/JigsawStack/insanely-fast-whisper-api and https://github.com/Vaibhavs10/insanely-fast-whisper

Trapper4888 commented 1 week ago

To add my 2 cents: Since openwebui has an integrated whisper running (and api possibility), it really feels like a wasted opportunity to not be able to use it directly. Same goes for TTS. I imagine a lot of the code is already there since they both are used behind the scenes.

But I have to acknowledge that openwebui is supposed to be a t2t UI, and starting to do stt and tts may be out of scope and increase complexity. In a perfect word, I host my own openai api whisper docker, connect it to openwebui docker, and for direct whisper usage I use another docker with a proper openai api compatible tts webui.

Still, would be very cool to have basic stt and tts using microphone and files in openwebui.