r9y9 / wavenet_vocoder

WaveNet vocoder
https://r9y9.github.io/wavenet_vocoder/
Other
2.31k stars 500 forks source link

training #9

Closed AzamRabiee closed 6 years ago

AzamRabiee commented 6 years ago

Thanks for sharing your comprehensive code! I've just started reading and running your code. In hparams, nepochs is set to 2000; but it seems to stop at the first epoch, as it shows in tensorboard: image it seems that the training loop is only depend on nepochs. Is there any other parameter to set for keep going the training process?

r9y9 commented 6 years ago

That's weird. Could you share the log output?

AzamRabiee commented 6 years ago

log.zip

r9y9 commented 6 years ago

Sorry, I meant stdout/stderr of train.py, not tf logs.

r9y9 commented 6 years ago

If nepoch equals 2000, then it should not stop at epoch 0.

AzamRabiee commented 6 years ago

Part of stdout:

Command line args: {'--checkpoint': None, '--checkpoint-dir': 'checkpoints', '--data-root': './data/cmu_arctic/', '--help': False, '--hparams': '', '--log-event-path': None, '--reset-optimizer': False, '--restore-parts': None, '--speaker-id': None} Hyperparameters: adam_beta1: 0.9 adam_beta2: 0.999 adam_eps: 1e-08 batch_size: 1 builder: wavenet checkpoint_interval: 10000 cin_channels: 80 clip_thresh: 1.0 dropout: 0.050000000000000044 fft_size: 1024 frame_shift_ms: None freq_axis_kernel_size: 3 gate_channels: 512 gin_channels: -1 hop_size: 256 initial_learning_rate: 0.001 kernel_size: 3 layers: 16 lr_schedule: noam_learning_rate_decay lr_schedule_kwargs: {} max_time_sec: None max_time_steps: 20000 min_level_db: -100 n_speakers: 7 name: wavenet_vocoder nepochs: 2000 num_mels: 80 num_workers: 2 pin_memory: True preset: presets: {} random_state: 1234 ref_level_db: 20 residual_channels: 256 sample_rate: 16000 save_optimizer_state: True silence_threshold: 2 skip_out_channels: 256 stacks: 2 test_eval_epoch_interval: 5 test_num_samples: None test_size: 0.0441 train_eval_interval: 10000 upsample_conditional_features: True upsample_scales: [16, 16] weight_decay: 0.0 weight_normalization: True Local conditioning enabled. Shape of a sample: (179, 80). [train]: length of the dataset is 7580 Speaker stats: {0: 1092, 3: 1081, 4: 1089, 6: 1079, 5: 1073, 1: 1095, 2: 1071} Local conditioning enabled. Shape of a sample: (123, 80). [test]: length of the dataset is 350 Speaker stats: {5: 59, 1: 37, 4: 43, 6: 53, 2: 61, 0: 46, 3: 51}

AzamRabiee commented 6 years ago

Sorry, Ive found the error: File "train.py", line 456, in eval_model save_waveplot(path, y_hat, y_target) File "train.py", line 405, in save_waveplot plt.figure(figsize=(16, 6)) ... _tkinter.TclError: no display name and no $DISPLAY environment variable

r9y9 commented 6 years ago

It seems there's no error messages. What problem are you seeing? I am not sure I understand you.

r9y9 commented 6 years ago

Ah, ok,

import matplotlib
matplotlib.use('Agg')
import matplotlib.pyplot as plt

I think this fixes your issue.

AzamRabiee commented 6 years ago

but your suggestion does not work. same error is raised! Anyway, I can go to more training epochs without saving the plot.

r9y9 commented 6 years ago

https://stackoverflow.com/questions/37604289/tkinter-tclerror-no-display-name-and-no-display-environment-variable

Did you put the snippet into the top of the train.py?

AzamRabiee commented 6 years ago

Yeap! It's solved by adding the snippet and 'export DISPLAY=mymachine.com:0.0' Thanks!

AzamRabiee commented 6 years ago

Is really 2000 nepochs required? I got almost good results with only 30 epochs!

r9y9 commented 6 years ago

No. I just set a big number. You can stop training anytime by hitting ctrl+c.

r9y9 commented 6 years ago

I'm closing this

zctang commented 5 years ago

@r9y9 Hi Ryuichi Thanks for your amazing work. After I read your readme.txt, I have a couple of questions.

  1. you introduce us 3 training ways: un-conditional wavenet, conditional wavenet on mel-spectrogram, conditional wavenet on mel-spectrogram and speaker embedding. so I wonder when you train your pre-trained model for multi-speaker wavenet(cmu_arctic data), which way did you use (conditional wavenet and speaker embedding)? I understand this 3 training ways as below: un-conditional wavenet training way is only for single speaker wavenet. like LJSpeech data conditional wavenet on mel-spectrogram is for multi-speaker wavenet and training only for specific speaker. like cmu_arctic data for speaker-id:awb. conditional wavenet on mel-spectrogram and speaker embedding is for multi-speaker wavenet and trainging for all speaker. what I understood is correct or not? thank you
    1. After I download cmu_arctic data, it shows it has 18 different speakers, but in you code, you only provide 7 speakers. where can I find you setting for this 7 speakers in you code. why did you choose this specific 7 speakers, randomly?

Thank you so much.

r9y9 commented 5 years ago

@zctang I'm not sure if I understand you correctly. For the conditioning settings, you can find what I used at https://r9y9.github.io/wavenet_vocoder/.

2: 7 speakers were available when I run the experiments (http://festvox.org/cmu_arctic/cmu_arctic/). You can use 18 speakers if you want.

zctang commented 5 years ago

@zctang I'm not sure if I understand you correctly. For the conditioning settings, you can find what I used at https://r9y9.github.io/wavenet_vocoder/.

2: 7 speakers were available when I run the experiments (http://festvox.org/cmu_arctic/cmu_arctic/). You can use 18 speakers if you want.

@r9y9 Thanks, I understood the first question.However, for the second question, if I want use your code and also use 18 speakers, should I also need to fix the code from nnmnkwii? Because I saw the cmu_arctic.py in it. image