Hi author, thanks for you sharing the creative project.
When I read the paper and code, I found that it is no needed speaker labels when training LauraTTS. The same as codes: dataset.py and other data_py_files show training only rely wav.scp and phoneme.list, and training data doesn't need to be spliced. So, I wonder that Funcodec and LauraTTS really supports zero-shot TTS? If my guess is wrong, thanks for your explain:)
Hi author, thanks for you sharing the creative project. When I read the paper and code, I found that it is no needed speaker labels when training LauraTTS. The same as codes: dataset.py and other data_py_files show training only rely wav.scp and phoneme.list, and training data doesn't need to be spliced. So, I wonder that Funcodec and LauraTTS really supports zero-shot TTS? If my guess is wrong, thanks for your explain:)