I am using the VITS model for text-to-speech synthesis with a configuration that specifies using a single voice. However, I am encountering two issues:
Sometimes the output speech is partially voiced by a male and partially by a female voice, even though the configuration is set to use a single voice.
Hello,
I am using the VITS model for text-to-speech synthesis with a configuration that specifies using a single voice. However, I am encountering two issues:
Here is my current configuration: