Reproducing the training setup for the TTS system in the LibriTTS-P paper

line / LibriTTS-P

LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning

112 stars 2 forks source link

Hello! Thank you for the LibriTTS-P dataset release and paper. I am interested in reproducing your prompt-based controllable TTS system and had a few questions about how you select the appropriate speaker prompts for each utterance in the dataset.

The paper explains that 3 different annotators labelled each speaker with perception and impression words, which means there are 3 annotations for each speaker. How did you select which annotation to use for each utterance for a given speaker?
Could you release the list of templates you used, such as 'The speaker’s identity can be described as...' and 'Descriptions of the speaker’s vocal style are...'?

Thanks in advance!

line / LibriTTS-P

Reproducing the training setup for the TTS system in the LibriTTS-P paper #2