Open KeiKinn opened 4 months ago
Thank you for your attention. Prompt diversity refers to the inclusion of 500 distinct natural text prompts for each style (such as high speech rate, high pitch, and low energy speech). This approach is fundamental to the design of the TextToSpeech (Textrolspeech) system. You can observe this by downloading the files in the Val directory. In contrast, the style descriptions in PromptTTS contain only a few text prompts for each style, which is insufficient for effective model training.
Hi, thank you for your great job.
I read the paper and downloaded the dataset, but still not fully understand '500 distinct natural text description'. It seems a very important statement in your paper. How does it come? How you define 'diversity' for every style? The audios that have same 'gender', 'pitch'... have different style prompt? Could you please explain it more clearly?