can you suggest settings for 16000hz sample rate and n_fft 1024 training

winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion

MIT License

144 stars 13 forks source link

can you suggest settings for 16000hz sample rate and n_fft 1024 training #5

Closed bharaniyv closed 1 year ago

bharaniyv commented 1 year ago

I want to retrain the model with 16khz sample rate with n_fft 1024 n_shift 256 to use the output model with standard vocoders but it is not working can you suggest the changes required for 16000hz training with n_fft 1024, n_shift 256, window_length 1024 training from scratch.

Thanks

winddori2002 commented 1 year ago

Hi,

Do you use the same settings for VC model and vocoder? The settings may contain n_fft, n_shift, and log transformation. For the ParallelWaveGAN, we also use log-melspectrogram.

Thanks.

bharaniyv commented 1 year ago

Hi I want to use hi-fi gan pre-released version instead of PWG it has n_fft 1024 and n_shift 256 so I wanted to retrain TriaanVC with those option but facing errors, can you suggest any changes to make it work with those parameters?

winddori2002 commented 1 year ago

The error is about training? or the performance after retraining? Actually, I'm also in the progress to train TriAAN-VC on Libri-TTS which is compatible with HiFiGAN. Since the hifigan takes mel-spectrogram as inputs, the preprocessing step should be changed (This repo provides the steps for log mel-spectrogram). You may refer to TacotronSTFT functions for the steps.

bharaniyv commented 1 year ago

the error is about size mismatch between the dimensions of CPC input and mel input to the encoder decoder model

winddori2002 commented 1 year ago

It can be different since the pre-trained CPC extractor processes 10 ms. So if you change the configuration, the CPC extractor should be re-trained again. It may be better to use mel-spectrogram versions instead of the CPC version if training the CPC extractor is costly.

bharaniyv commented 1 year ago

Thanks for the clarification but mel-spectrogram version is not as good as CPC version right, have you tried other options like WavLM or Wav2vec do you think any of those will work?

winddori2002 commented 1 year ago

I have tried wav2vec, but it did not contribute to significant improvement as far as I remembered.