r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.33k stars 500 forks source link

local and global conditioning features for cmu-arctic dataset results #179

Closed shiva1393 closed 4 years ago

shiva1393 commented 4 years ago

Hai, Trained multi-speaker wavenet for cmu-arctic dataset but didn't get reconstructed wave file like which mentioned in "https://r9y9.github.io/wavenet_vocoder/". For LJSpeech with local conditional features got good results.

Followed https://github.com/r9y9/wavenet_vocoder/releases/tag/v0.1.1 code for multi-speaker. By following readme file: step1) python preprocess.py cmu_arctic $db_root $data_feats --preset=multispeaker_cmu_arctic_mixture.json step2) python train.py --data-root=$data_feats please tell what i have to change(or where iam doing mistake) while training multi-speaker wavenet.

r9y9 commented 4 years ago

Could you provide more details?

shiva1393 commented 4 years ago

wavenet_vocoder-0.1.1 version: WN conditioned on mel-spectrogram and speaker-embedding (16000kHz) data-set : [cmu_arctic] params : hparams="cin_channels=80,gin_channels=16,n_speakers=7" used presets/multispeaker_cmu_arctic_mixture.json : This file have same parameters which used in "https://r9y9.github.io/wavenet_vocoder/" For preprocess : used cmu_arctic.py prediction wave file after 5lakh steps: https://drive.google.com/file/d/1yQaLhiUT0C-_tWeglrBNtigV1r0LB2IR/view?usp=sharing

Sorry i think iam not giving enough details to tell. Can you ask which details need exactly.

After preprocessing we need to do normalization? normalization effects more on results? I didn't use normalization because meanvar.joblib i didn't get after using (preprocess.py cmu_arctic data_root)

r9y9 commented 4 years ago

I don't have access to the google drive. I'm not sure what you mean by "didn't get reconstructed wave". Samples are generated but bad quality? How bad it was? or files not found?

What do you mean by 5lakh steps? How do you generate samples? Did you use evaluate.py? synthesis.py? What are the exact commands? More details, please.

In v0.1.1, normalization is done in preprocess.py so you don't need to worry about it. I believe it doesn't have a big impact on quality.

shiva1393 commented 4 years ago

Thanks for your reply. Google drive link: target wave file: https://drive.google.com/open?id=1llTVhzJ9KcgVN50Oa_ckFKUiuHlFckvJ predicted wave file: https://drive.google.com/open?id=1DcDJL4bok3LuhM6Jlhf0l7OFGuVHt7v9

Didn't get reconstructed wave means samples are generated with bad quality.

For reconstruction wave file used: evaluate.py eval_checkpoint dst_dir --preset presets/multispeaker_cmu_arctic_mixture.json

Used "nepochs": 2000, Actually problem is not in evaluation because while training the network train_eval folder i checked. intermediate wave files are not good. I did not change any parameters in multispeaker_cmu_arctic_mixture.json file.

Exact Commands how i trained and evaluated: db_root=data_dir of cmu_arctic multispeaker_cmu_arctic_mixture(not changed) link: https://drive.google.com/open?id=1Q2AoaruoFBzGhN_EiwzMnMTsGyyCGHTx

Step1: 1) python preprocess.py cmu_arctic $db_root $data_feats -- preset=presets/multispeaker_cmu_arctic_mixture.json

Step2): 2)python train.py --data-root=$data_feats preset=presets/multispeaker_cmu_arctic_mixture.json

Step3): eval_set=taken random 10 wav files from cmu_arctic eval_checkpoint=checkpoint_dir; dst_dir=destination_dir 3)python evaluate.py $eval_checkpoint $dst_dir \ --preset $hparams --hparams="batch_size=$inference_batch_size" --data-root=$eval_set

r9y9 commented 4 years ago

Seems like samples are not too bad. Generally speaking, MoL models require more iterations (~ 1000k steps) for convergence than the discrete output models. You could try either training the model longer or switch to the discrete output version of WaveNet instead.

shiva1393 commented 4 years ago

Thank you. I will check.