nii-yamagishilab / multi-speaker-tacotron

VCTK multi-speaker tacotron for ICASSP 2020
BSD 3-Clause "New" or "Revised" License
265 stars 41 forks source link

Can we get a cloned voicie in Real Time ? #11

Closed elmoundir-rohmat closed 1 year ago

elmoundir-rohmat commented 1 year ago

Hello,

I have 2 quick questions about what can be done using Tacotron.

What is the minimum training time (in minutes) required to have a good result ? Can the processing time (after training data) be instantaneous ? I mean if we can get the cloned voice in real time... Happy new year by the way !

Thank you !

ecooper7 commented 1 year ago

Hello, and happy new year to you, too!

Initializing from a well-trained single speaker model, the multi-speaker model can be well-trained in one day on a single GPU. Unfortunately due to licensing restrictions, we are not able to release our single-speaker model (trained on the Nancy data from the Blizzard Challenge 2011). Training from scratch on the VCTK data only took about four days.

Sorry, but this codebase does not support real-time processing.