winddori2002 / TriAAN-VC

TriAAN-VC: Triple Adaptive Attention Normalization for Any-to-Any Voice Conversion
MIT License
129 stars 12 forks source link

Evaluation results #13

Closed Blakey-Gavin closed 1 year ago

Blakey-Gavin commented 1 year ago

After training for 200 epochs with the Chinese datasets, I got the following results:

image

The result here is that I just used the Chinese datasets for training, but did not retrain CPC. Maybe this is the reason why the results are very different from the results in your paper?

But if I remember correctly, I saw in CPC-audio's paper that the author mentioned that the model can be transferred to other languages and perform well?

winddori2002 commented 1 year ago

Hi

I wonder if you re-trained the vocoder part. The CPC's author said it transfers well, but fine-tuning is better for adaptation. If you have re-trained the vocoder, you can check using mel-spectrogram features instead of using CPC.

Blakey-Gavin commented 1 year ago

Yes, I'm currently retraining the vocoder, except for some parameters like sample rate etc. I follow the source code settings.

Do you mean that if I want to perform well on Chinese or other datasets, I can use the following two methods?

  1. Use the new datasets to fine-tuning CPC, and use the new datasets to train TriAANVC and vocoder;
  2. The CPC does not make any changes, the input of TriAANVC uses the mel-spectrogram of the new datasets, and the vocoder is re-trained using the new datasets.

Thank you very much for taking the time to reply me, it helps me a lot.

winddori2002 commented 1 year ago

Yes. If you want to use another dataset (especially different languages), you need to re-train TriAAN-VC and vocoder. Of course, it might be better to fine-tune CPC, but if it is difficult, I recommend using mel-spectrogram as inputs for TriAAN-VC.