Closed choiHkk closed 9 months ago
I also had to use the VAD trim of coqui tts, otherwise the vctk training didnt work at all.
@AWAS666 I agree with that. When using a raw waveform that has not undergone preprocessing directly for training, none of the modules related to duration were properly learned, and the loss did not converge either.
Thanks for this PR. Can you add a corresponding config.json for vctk with updated cleaners? Thanks a lot!!
@p0p4k we just need to change "text_cleaners" from the config file to "english_cleaners3". I've added the config file, so please confirm the config.
There is a point I didn't mention. When using preprocess.py to generate filelists, we need to input "english_cleaners3" into the parser due to the following code.
# https://github.com/p0p4k/vits2_pytorch/blob/main/preprocess.py#L17C4-L17C85
...
parser.add_argument("--text_cleaners", nargs="+", default=["english_cleaners2"])
...
I found that training with the "VCTK multi-speaker dataset," the training did not work well with the "raw data" In the last code you uploaded, I made some additional changes regarding "preprocess"
The added features are as follows:
text
audio