Closed Enchante503 closed 1 month ago
Yes, because the audio data used for training is typically shorter, while the text output tends to be longer, it can easily result in incomplete speech when the text is particularly long.
The audio is shorter than the generated text and doesn't say the whole thing.
mini-omni is great, is it possible to improve fluency and adjust speaking speed? Is it possible to display the generated text on the demo screen along with the audio?
The audio is shorter than the generated text and doesn't say the whole thing.
mini-omni is great, is it possible to improve fluency and adjust speaking speed? Is it possible to display the generated text on the demo screen along with the audio?