speech speed - Githubissues

unilight / seq2seq-vc

A sequence-to-sequence voice conversion toolkit.

MIT License

84 stars 10 forks source link

speech speed #14

Closed chiaki-luo closed 5 months ago

chiaki-luo commented 5 months ago

Dear author, it sounds like the decoding results of Stage 6 are faster than those of Stage 4. How could I fix the problem?

unilight commented 5 months ago

Hi @chiaki-luo , can you tell me what recipe you are running? Also it would be helpful if you can attach samples.

chiaki-luo commented 5 months ago

Thank you for your response @unilight. I run the L2-arctic recipe. It seems that the performance of the VC model has something wrong. conversion.zip

unilight commented 5 months ago

Hi @chiaki-luo which recipe under l2-arctic are you running?

chiaki-luo commented 5 months ago

@unilight cascade

unilight commented 5 months ago

Hi @chiaki-luo, I noticed that in the zip file you attached, the stage6_arctic_b0440.wav file has a sampling rate of 24kHz. Can you tell me how you generated that sample? If you used the PWG I provided, it should generate 16kHz outputs.

chiaki-luo commented 5 months ago

I skip the evaluatiuon, just run ./run.sh --stage 4 --stop_stage 4 and ./run.sh --stage 6 --stop_stage 6. I used PWG_THXC. Is there any hparam setting for the sampling rate of generation?

unilight commented 5 months ago

Can you show me this file: exp/TXHC_bdl_1032_vtn.tts_pt.v1/results/checkpoint-50000steps/stage2_ppg_sxliu_checkpoint-50000steps_TXHC_dev/decode.log? (This is the log of the stage 2 decoding)

chiaki-luo commented 5 months ago

@unilight I downloaded the ckpt from huggingface manually, maybe I downloaded a wrong ckpt? decode.log

unilight commented 5 months ago

From decode.log I see you are using /media/hello/xiaolan_T7/seq2seq-vc-main/egs/l2-arctic/cascade/downloads/hifigan/checkpoint-2500000steps.pkl which is a HiFiGAN vocoder. I don't think I provided a HiFiGAN vocoder... where did you download it? Please run stage -1. The script automacially downloads all the pre-trained models.

chiaki-luo commented 5 months ago

@unilight Thank you for your help. It's my bad for a vocoder change. I appreciated it.