hayeong0 / DDDM-VC

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
https://hayeong0.github.io/DDDM-VC-demo/
160 stars 18 forks source link

May I ask if there is a version of your model trained on the VCTK dataset? #8

Closed ShangkunTu closed 3 months ago

ShangkunTu commented 4 months ago

Thank you for your wonderful work, but the LibriTTS dataset is too large and as a student, I don't have as many computing resources. For the convenience of comparison, may I ask if there is a version of your model trained separately on the VCTK dataset? Thank you again

hayeong0 commented 4 months ago

Hello, thank you for your interest.

We have never used VCTK for training specifically targeting zero-shot scenarios. Since VCTK is composed of high-quality sample, we believe it will yield better audio quality results compared to the existing our LibriTTS version. However, if training is conducted using only VCTK, I think the zero-shot performance on out-of-domain data such as real-world data may be relatively insufficient compared to training with LibriTTS data.

ShangkunTu commented 4 months ago

Hello, thank you for your interest.

We have never used VCTK for training specifically targeting zero-shot scenarios. Since VCTK is composed of high-quality sample, we believe it will yield better audio quality results compared to the existing our LibriTTS version. However, if training is conducted using only VCTK, I think the zero-shot performance on out-of-domain data such as real-world data may be relatively insufficient compared to training with LibriTTS data.

Thank you very much for your reply. I may try to train your model with VCTK, and I believe your model will also achieve relatively good performance on VCTK because I believe it is a very excellent job. Thank you again!