Voice conversion - Githubissues

interactiveaudiolab / ppgs

High-Fidelity Neural Phonetic Posteriorgrams

https://maxrmorrison.com/sites/ppgs

MIT License

91 stars 6 forks source link

Voice conversion #12

Closed mudabek closed 4 months ago

mudabek commented 7 months ago

Hey! Thank you for sharing your work!

In the demo there is an example of voice conversion, but I couldn't find voice conversion function in the repo. Is it possible for you to share the way you ran voice conversion?

maxrmorrison commented 7 months ago

This repo does not contain a synthesis module that converts from input features to speech. That's coming soon (in a different repo). I'll keep this issue open until then.

u517872i commented 6 months ago

Hi! I I am grateful for the work and also wondering if the voice conversion repo is still coming.

maxrmorrison commented 6 months ago

Yes, the repo containing VC is still coming. I do not have a fixed release date, but somewhere in the range of 2-6 months is probably correct.

The synthesis method in our paper "High-Fidelity Neural Phonetic Posteriorgrams" is relatively naive and not intended to be SOTA VC (yet). If I release the VC now, researchers would use that to discredit our entire PPG representation in favor of other representations. Once released, the VC repo will contain the additional details needed to finish the story.

Stay tuned! =)

lexkoro commented 5 months ago

Hi @maxrmorrison could you provide some hints on how you have used the PPG features to train VC? :) Would be interested if we would be able to also model durations using the PPG representations, for a varying durations VC.

maxrmorrison commented 4 months ago

Our follow-up paper is out! The system it describes is amenable to high-fidelity voice conversion with reconstruction quality on-par with Mel vocoding. Code coming soon =)

If you'd like even more details, I provide the whole story (and more VC examples) in my thesis

maxrmorrison commented 4 months ago

Code is released! Thanks for your patience