kan-bayashi / PytorchWaveNetVocoder

WaveNet-Vocoder implementation with pytorch.
https://kan-bayashi.github.io/WaveNetVocoderSamples/
Apache License 2.0
297 stars 57 forks source link

Speech to Speech #22

Closed PetrochukM closed 6 years ago

PetrochukM commented 6 years ago

Does it make sense to use the Wavenet vocoder as it is for speech to speech? For example, Can I record my voice, generate a melspectrogram, then use a pre-trained model on LJSpeech dataset to respeak it?

I've been trying this and the results don't sound good!

jiqizaisikao commented 6 years ago

Could you upload your wav result,

kan-bayashi commented 6 years ago

How about the sample in LJSpeech, which is not included in training data?

splinter21 commented 6 years ago

You mean voice conversion?

PetrochukM commented 6 years ago

Hi There!

Here are my results: https://drive.google.com/drive/folders/1RHfTJsY6wyqTSs3GisgUEtySy0nCaKf8?usp=sharing

I tried to convert a voice with arctic/si-close.

Thanks, Michael

jiqizaisikao commented 6 years ago

@PetrochukM How you generated these voice?By changing the speaker condition?it sounds good

PetrochukM commented 6 years ago

The goal was to use the original speaker and to transform it into the arctic speaker. That is not what I accomplished.

jiqizaisikao commented 6 years ago

I think that if you want to transform other speech into the arctic speaker.You have to use a lot of speech to train.

jiqizaisikao commented 6 years ago

Your can search laboratory of author kan-bayashi,they have being doing this for long

kan-bayashi commented 6 years ago

@PetrochukM I'm not what you done, but maybe what you want to do is a voice conversion. It is interesting for you to check the papers related to Voice conversion challenge http://www.vc-challenge.org. We will also publish papers about voice conversion system using WaveNet vocoder soon.

splinter21 commented 6 years ago

@kan-bayashi Does the voice conversion system using WaveNet vocoder need only data from one source speaker, and then input voice from anyone, output the voice whose timbre is the source speaker?

kan-bayashi commented 6 years ago

@splinter22 Unfortunately, our current voice conversion system is based on the use of parallel data. We train feature conversion model using parallel data, and then the trained model converts the features of source speaker to that of target speaker. Finally, converted features is inputted into SI-WaveNet vocoder to generate a speech.

maozhiqiang commented 6 years ago

@kan-bayashi good job, How to use the LJSpeech data set to train the model

PetrochukM commented 6 years ago

Paper by Google that accomplishes Speech to Speech with Tacotron + Wavenet: https://google.github.io/tacotron/publications/end_to_end_prosody_transfer/index.html

kan-bayashi commented 6 years ago

@PetrochukM Thank you for your information. I will try to implement them.

@maozhiqiang I will add LJSpeech example.

maozhiqiang commented 6 years ago

@kan-bayashi thank you!

PetrochukM commented 6 years ago

@kan-bayashi did you implement speech to speech?