how does it achieve zero-shot tts

modelscope / FunCodec

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

https://funcodec.github.io/

MIT License

370 stars 30 forks source link

how does it achieve zero-shot tts #23

Open forwiat opened 9 months ago

forwiat commented 9 months ago

Hi author, thanks for you sharing the creative project. When I read the paper and code, I found that it is no needed speaker labels when training LauraTTS. The same as codes: dataset.py and other data_py_files show training only rely wav.scp and phoneme.list, and training data doesn't need to be spliced. So, I wonder that Funcodec and LauraTTS really supports zero-shot TTS? If my guess is wrong, thanks for your explain:)