hayeong0 / DDDM-VC

Official Pytorch Implementation for "DDDM-VC: Decoupled Denoising Diffusion Models with Disentangled Representation and Prior Mixup for Verified Robust Voice Conversion" (AAAI 2024)
https://hayeong0.github.io/DDDM-VC-demo/
160 stars 18 forks source link

DDDM-TTS #13

Closed csu-roycheng closed 2 weeks ago

csu-roycheng commented 2 months ago

Thanks for your great work!

I am very interested in DDDM-TTS model, will you open source the code?

hayeong0 commented 2 months ago

Thank you for your interest in our work.

You can replace the content (w2v) of the DDDM-VC model with the Text-to-Vec part of DDDM-TTS, which encodes text into w2v.

We currently have no plans to release DDDM-TTS additionally, but you can use the ttv code available in our subsequent project, the HierSpeech++ repository.

Thanks!

csu-roycheng commented 2 months ago

Thank you for your interest in our work.

You can replace the content (w2v) of the DDDM-VC model with the Text-to-Vec part of DDDM-TTS, which encodes text into w2v.

We currently have no plans to release DDDM-TTS additionally, but you can use the ttv code available in our subsequent project, the HierSpeech++ repository.

Thanks!

Thanks for your reply!

I found "mms-300m" was used in Hierspeech++, and "XLS-R" was used in DDDM to extract self-supervised speech representation. I would like to know whether these two methods have a significant impact on the result.

hayeong0 commented 2 months ago

@csu-roycheng

After XLS-R, MMS was released, covering more languages. We trained the model by replacing the module with MMS, achieving slightly better performance in English. Also, We're now using MMS in our subsequent research to enhance multi/cross-lingual scalability!