p0p4k / vits2_pytorch

unofficial vits2-TTS implementation in pytorch
https://arxiv.org/abs/2307.16430
MIT License
465 stars 81 forks source link

add preprocess pipelines #60

Closed choiHkk closed 9 months ago

choiHkk commented 9 months ago

I found that training with the "VCTK multi-speaker dataset," the training did not work well with the "raw data" In the last code you uploaded, I made some additional changes regarding "preprocess"

The added features are as follows:

text

audio

AWAS666 commented 9 months ago

I also had to use the VAD trim of coqui tts, otherwise the vctk training didnt work at all.

choiHkk commented 9 months ago

@AWAS666 I agree with that. When using a raw waveform that has not undergone preprocessing directly for training, none of the modules related to duration were properly learned, and the loss did not converge either.

p0p4k commented 9 months ago

Thanks for this PR. Can you add a corresponding config.json for vctk with updated cleaners? Thanks a lot!!

choiHkk commented 9 months ago

@p0p4k we just need to change "text_cleaners" from the config file to "english_cleaners3". I've added the config file, so please confirm the config.

choiHkk commented 9 months ago

There is a point I didn't mention. When using preprocess.py to generate filelists, we need to input "english_cleaners3" into the parser due to the following code.

# https://github.com/p0p4k/vits2_pytorch/blob/main/preprocess.py#L17C4-L17C85
...
parser.add_argument("--text_cleaners", nargs="+", default=["english_cleaners2"])
...