add preprocess pipelines

choiHkk commented 9 months ago

I found that training with the "VCTK multi-speaker dataset," the training did not work well with the "raw data" In the last code you uploaded, I made some additional changes regarding "preprocess"

The added features are as follows:

text

Based on version 2.x.x, when using the "phonemizer.phonemize()" function directly, there is a tendency for it to be very slow.
I added code to call "EspeakBackend" by customizing the backend almost the same as before.
When using "EspeakBackend," the speed becomes much faster, and you can use it almost the same as before (based on version 2.x.x).

audio

The "VCTK dataset" contains a lot of "padding" and a considerable amount of "hum noise" in the "padding" areas.
I attempted to remove it using the "librosa.effects.trim()" function, but it was not easy to adjust the parameters.
I was inspired by the GitHub repository "https://github.com/nii-yamagishilab/vctk-silence-labels" and added preprocessing code.

AWAS666 commented 9 months ago

I also had to use the VAD trim of coqui tts, otherwise the vctk training didnt work at all.

choiHkk commented 9 months ago

@AWAS666 I agree with that. When using a raw waveform that has not undergone preprocessing directly for training, none of the modules related to duration were properly learned, and the loss did not converge either.

p0p4k commented 9 months ago

Thanks for this PR. Can you add a corresponding config.json for vctk with updated cleaners? Thanks a lot!!

choiHkk commented 9 months ago

@p0p4k we just need to change "text_cleaners" from the config file to "english_cleaners3". I've added the config file, so please confirm the config.

choiHkk commented 9 months ago

There is a point I didn't mention. When using preprocess.py to generate filelists, we need to input "english_cleaners3" into the parser due to the following code.

# https://github.com/p0p4k/vits2_pytorch/blob/main/preprocess.py#L17C4-L17C85
...
parser.add_argument("--text_cleaners", nargs="+", default=["english_cleaners2"])
...

p0p4k / vits2_pytorch

add preprocess pipelines #60

text

audio