gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
https://arxiv.org/abs/2408.16725
MIT License
3.06k stars 273 forks source link

Is there any possibility for RAG with Mini-Omni? #59

Closed yash-chudasama closed 1 month ago

superFilicos commented 1 month ago

We believe the simplest approach is for a small model to determine, based on the semantics of the input, whether RAG is needed, and then append the relevant text to the input as supporting material.

zzhiyun commented 1 month ago

So when using RAG, do we search directly by voice or convert the voice into text for search? Is the knowledge base stored in voice information or text information?

mini-omni commented 1 month ago

I suppose RAG may not work will with current version of Mini-Omni. If you want to use RAG, I think you should: transribe the audio question into text, then retrieve the related context and input the question and context as text just as other LLMs do, the get the answer with audio or just text.

mini-omni commented 1 month ago

I'll close it for now, please feel free to re-open.