huggingface / parler-tts

Inference and training library for high-quality TTS models.
Apache License 2.0
2.6k stars 265 forks source link

Streaming support? #12

Open jay2jp opened 1 month ago

jay2jp commented 1 month ago

Is there any streaming support for this model? if there is a way to do it i would love to get involved and help out!

ylacombe commented 3 weeks ago

Hey @jay2jp, thanks for opening the discussion!

Streaming could be done by adapting the Streamer used for Musicgen with the current code! It'd be great if you could be involved in this! What do you think?

Let me know if you need help! Best

Mortezanavidi commented 17 hours ago

Hey @ylacombe,

I would be more than happy to to adopt the module to the current parler-tts, i did a research and found out @sanchit-gandhi already created a space for it on huggingface and the Streamer module so i don't know if there is still need for it to be added.

But to make the module more advanced i wanted to know, is there any way we get to know the duration of each word getting streamed? and their start time ( offset ) in the full audio? this could help a lot of developers to have more control over the streamed audio, for example it can be used for lipsyncs, or real time function control and .....

So if it's possible in any way, i would be more than happy to extend the streaming module of parler ( indeed with a little bit of your kind help 😅). Best regards