Closed PetrochukM closed 6 years ago
Could you upload your wav result,
How about the sample in LJSpeech, which is not included in training data?
You mean voice conversion?
Hi There!
Here are my results: https://drive.google.com/drive/folders/1RHfTJsY6wyqTSs3GisgUEtySy0nCaKf8?usp=sharing
I tried to convert a voice with arctic/si-close
.
Thanks, Michael
@PetrochukM How you generated these voice?By changing the speaker condition?it sounds good
The goal was to use the original speaker and to transform it into the arctic speaker. That is not what I accomplished.
I think that if you want to transform other speech into the arctic speaker.You have to use a lot of speech to train.
Your can search laboratory of author kan-bayashi,they have being doing this for long
@PetrochukM I'm not what you done, but maybe what you want to do is a voice conversion. It is interesting for you to check the papers related to Voice conversion challenge http://www.vc-challenge.org. We will also publish papers about voice conversion system using WaveNet vocoder soon.
@kan-bayashi Does the voice conversion system using WaveNet vocoder need only data from one source speaker, and then input voice from anyone, output the voice whose timbre is the source speaker?
@splinter22 Unfortunately, our current voice conversion system is based on the use of parallel data. We train feature conversion model using parallel data, and then the trained model converts the features of source speaker to that of target speaker. Finally, converted features is inputted into SI-WaveNet vocoder to generate a speech.
@kan-bayashi good job, How to use the LJSpeech data set to train the model
Paper by Google that accomplishes Speech to Speech with Tacotron + Wavenet: https://google.github.io/tacotron/publications/end_to_end_prosody_transfer/index.html
@PetrochukM Thank you for your information. I will try to implement them.
@maozhiqiang I will add LJSpeech example.
@kan-bayashi thank you!
@kan-bayashi did you implement speech to speech?
Does it make sense to use the Wavenet vocoder as it is for speech to speech? For example, Can I record my voice, generate a melspectrogram, then use a pre-trained model on LJSpeech dataset to respeak it?
I've been trying this and the results don't sound good!