Open Blazzycrafter opened 8 months ago
We're planning to release long-form and streaming soon after we've had some bandwidth to push code with faster inference...
by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...
Hey @vatsalaggarwal, is that release still in the pipeline?
@platform-kit, yes release is still planned. We just released fine-tuning capabilities #93. We are now going to start working on long-form & streaming.
Would love insights on the below
by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...
@sidroopdaska The way I did this in my implementation of XTTS (https://github.com/Render-AI/cog-xtts-v2/blob/main/predict.py) was to split the text into chunks (i.e. sentences, but it could be done in other ways), then render each sentence as an audio output and then concatenate the audio.
You do lose some context this way but it makes the output very stable (avoiding weird outputs where the voice trails off as the duration increases, for example).
Would love insights on the below
by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...
@sidroopdaska daswer123 has made a WebUI that has infinite amount of text input, the API streaming is still coming soon though. https://github.com/daswer123/xtts-webui
i wanna use it in role plays and the audio is mostly 500+ chars big so the generation is long..... is there and stream mode planned? like in xtts?