I have try OpenVoice,it perform good in english tts. But I notice that if I want to clone one's voice,I need to convert tone color every time. It take 200ms to generate voice, and other 500ms to complete tone color converting. Is there an easy way to save this 500ms-time?(e.g use a short audio file as reference speaker and extract embedding from it, then use this embedding as source_se to generate voice with that tone color. Sound like I am asking a way for adding base speaker base on few audio record file of that speaker)
I have try OpenVoice,it perform good in english tts. But I notice that if I want to clone one's voice,I need to convert tone color every time. It take 200ms to generate voice, and other 500ms to complete tone color converting. Is there an easy way to save this 500ms-time?(e.g use a short audio file as reference speaker and extract embedding from it, then use this embedding as source_se to generate voice with that tone color. Sound like I am asking a way for adding base speaker base on few audio record file of that speaker)