Closed mirfan899 closed 4 years ago
try to increase the value of max_iters and make sure the values of batch_size and output_perstep are appropriate.
Well, it was related to CUDNN version issue. I had 7.0 but tacotron required 7.1.4. After updating CUDNN I'm getting this error
heckpoint path: ./logs-tacotron/model.ckpt
Loading training data from: ./training/train.txt
Using model: tacotron
Hyperparameters:
adam_beta1: 0.9
adam_beta2: 0.999
attention_depth: 256
batch_size: 32
cleaners: english_cleaners
decay_learning_rate: True
decoder_depth: 256
embed_depth: 256
encoder_depth: 256
frame_length_ms: 50
frame_shift_ms: 12.5
griffin_lim_iters: 60
initial_learning_rate: 0.002
max_iters: 200
min_level_db: -100
num_freq: 1025
num_mels: 80
outputs_per_step: 5
postnet_depth: 256
power: 1.5
preemphasis: 0.97
prenet_depths: [256, 128]
ref_level_db: 20
sample_rate: 20000
use_cmudict: False
Loaded metadata for 850 examples (1.81 hours)
Initialized Tacotron model. Dimensions:
embedding: 256
prenet out: 128
encoder out: 256
attention out: 256
concat attn & out: 512
decoder cell out: 256
decoder out (5 frames): 400
decoder out (1 frame): 80
postnet out: 256
linear out: 1025
Starting new training run at commit: None
Generated 32 batches of size 32 in 46.279 sec
Step 1 [60.680 sec/step, loss=0.82188, avg_loss=0.82188]
Step 2 [32.100 sec/step, loss=0.81987, avg_loss=0.82088]
Step 3 [22.420 sec/step, loss=0.82569, avg_loss=0.82248]
Step 4 [17.712 sec/step, loss=0.81703, avg_loss=0.82112]
Exiting due to exception: Incompatible shapes: [32,1285,80] vs. [32,1000,80]
[[node model/loss/sub (defined at /home/virtuoso_irfan/tacotron/models/tacotron.py:118) = Sub[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](datafeeder/input_queue_Dequeue/_21, model/inference/Reshape)]]
[[{{node model/optimizer/gradients/model/inference/post_cbhg/highway_2/H/Tensordot/Reshape_grad/Shape/_461}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_4962_...grad/Shape", tensor_type=DT_INT32, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
The issue is fixed after updating max_iters=300
in hparams.py
I'm trying to train the model on Ubuntu 16.04 with GPU Tesla K80 and CUDA 9, CUDNN 7 and tensorflow-gpu==1.12, having this issue whey try to train the model