gpt-omni / mini-omni2

Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities。
https://arxiv.org/abs/2410.11190
MIT License
1.59k stars 188 forks source link

Is it possibel to do RAG woth this model?. #35

Open ParthArora11 opened 2 weeks ago

ParthArora11 commented 2 weeks ago

Hi, thank you for your excellent work. As we know, in text-to-text models, we can perform Retrieval-Augmented Generation (RAG). For more clarification, I have my personal data in text format, but to make an assistant, the input could be either audio or text. I’d like to avoid converting audio to text for contextual retrieval. I have a couple of questions:

  1. Is it possible to search documents using voice embeddings directly by passing in voice data?
  2. Is it possible to provide contextual text as input to the model alongside an audio file?
mini-omni commented 1 week ago

hi, for now, the model is only trained with single turn dialogue data, so it does not support RAG or in-context-learning.