Open king-dahmanus opened 6 months ago
The model is trained on flawed LJSpeech and LibriTTS data. They both contain errors in their transcriptions - one of which is that incorrect pause on dash. You'll need to train your own model using your own data to mitigate this. You can get some more valuable info about the training / finetuning in this discussion thread, too.
Can I download the data from "https://keithito.com/LJ-Speech-Dataset/" and train it?
Can I download the data from "https://keithito.com/LJ-Speech-Dataset/" and train it?
Yes, you could - however bear in mind that the dataset is flawed and you'd end up with exactly the same problems as the sample model of StyleTTS2 has.
Hello there, devs of Style TTS2, it's a great model, you really did a good job. I mainly use it on the hf demo, but there are some issues: Firstly, it pauses after the dash - symbol, so please fix it. For example, it reads white-clothed as "White. Clothed". Secondly, sometimes it does random bursts of distorted noise, skipping words. Can you find a way to fix this? Is this an issue of the pretrained model or the architecture itself? Thanks and regards