Closed yash-chudasama closed 1 month ago
So when using RAG, do we search directly by voice or convert the voice into text for search? Is the knowledge base stored in voice information or text information?
I suppose RAG may not work will with current version of Mini-Omni. If you want to use RAG, I think you should: transribe the audio question into text, then retrieve the related context and input the question and context as text just as other LLMs do, the get the answer with audio or just text.
I'll close it for now, please feel free to re-open.
We believe the simplest approach is for a small model to determine, based on the semantics of the input, whether RAG is needed, and then append the relevant text to the input as supporting material.