k2kobayashi / crank

A toolkit for non-parallel voice conversion based on vector-quantized variational autoencoder
MIT License
169 stars 31 forks source link

Why my generated waveform on VCC2020 is slower than original one #51

Closed Mortyzhou-Shef-BIT closed 3 years ago

Mortyzhou-Shef-BIT commented 3 years ago

image This is generated one, the speech of speech is twice slower than orignal one... and my pwg vocoder just train to 200000.ckpt

this is the duration of same waveform for original and your sample : image

And I used the default configuration and setup...Could you help me to fix this kind of problem? Does it will influence the MCD and MOSNET outcome?

unilight commented 3 years ago

Hi @zhouyh-jlu , did you train your own PWG? You may check the hop size and upsampling stride numbers. And yes, it will affect the MCD and MOSNet results.

Mortyzhou-Shef-BIT commented 3 years ago

Hi @zhouyh-jlu , did you train your own PWG? You may check the hop size and upsampling stride numbers. And yes, it will affect the MCD and MOSNet results.

Yes, I found I used PWG from espnet's vcc2020 and some issues for hop size. Thank you for your help.

talka1 commented 2 years ago

so what did you exactly do to fix the problem with the slowed down generated Eval Wavs after step 6?