How to train this model with my own audio-to-audio data, any insturctions or documentations?

gpt-omni / mini-omni

open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.

https://arxiv.org/abs/2408.16725

MIT License

3.1k stars 277 forks source link

How to train this model with my own audio-to-audio data, any insturctions or documentations? #95

Closed chaunceyliu30 closed 2 weeks ago

chaunceyliu30 commented 1 month ago

I'm not very familiar with llm training in multi-modality, like picture or audio, is there any instruction for starters or where can I find some tutorial material? Thanks.

superFilicos commented 1 month ago

Sorry, 这个建议您寻找一个大模型的训练框架然后改动model的建模方法，没有特别复杂，但是我们暂时还不会开源训练代码~

mini-omni commented 1 month ago

hi, @chaunceyliu30, you may refer to litgpt for the training scripts.