long-form/streaming support?

metavoiceio / metavoice-src

Foundational model for human-like, expressive TTS

https://themetavoice.xyz/

Apache License 2.0

3.88k stars 658 forks source link

long-form/streaming support? #53

Open Blazzycrafter opened 8 months ago

Blazzycrafter commented 8 months ago

i wanna use it in role plays and the audio is mostly 500+ chars big so the generation is long..... is there and stream mode planned? like in xtts?

vatsalaggarwal commented 8 months ago

We're planning to release long-form and streaming soon after we've had some bandwidth to push code with faster inference...

by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...

platform-kit commented 8 months ago

Hey @vatsalaggarwal, is that release still in the pipeline?

sidroopdaska commented 8 months ago

@platform-kit, yes release is still planned. We just released fine-tuning capabilities #93. We are now going to start working on long-form & streaming.

Would love insights on the below

by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...

platform-kit commented 8 months ago

@sidroopdaska The way I did this in my implementation of XTTS (https://github.com/Render-AI/cog-xtts-v2/blob/main/predict.py) was to split the text into chunks (i.e. sentences, but it could be done in other ways), then render each sentence as an audio output and then concatenate the audio.

You do lose some context this way but it makes the output very stable (avoiding weird outputs where the voice trails off as the duration increases, for example).

MethanJess commented 7 months ago

Would love insights on the below

by the way, can you point me to how you're generating 500+ chars / streaming with xtts? i've tried https://huggingface.co/spaces/coqui/xtts but this has a 200 chars limit...

@sidroopdaska daswer123 has made a WebUI that has infinite amount of text input, the API streaming is still coming soon though. https://github.com/daswer123/xtts-webui