keithito / tacotron

A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
MIT License
2.96k stars 957 forks source link

Incompatible shapes: [32,1025,80] vs. [32,1000,80] #260

Closed peter05010402 closed 5 years ago

peter05010402 commented 5 years ago

Starting new training run at commit: None Generated 32 batches of size 32 in 5.544 sec Step 1 [14.291 sec/step, loss=0.84169, avg_loss=0.84169] Step 2 [8.264 sec/step, loss=0.85866, avg_loss=0.85018] Step 3 [6.260 sec/step, loss=0.81893, avg_loss=0.83976] Step 4 [5.206 sec/step, loss=0.81658, avg_loss=0.83397] Step 5 [4.307 sec/step, loss=0.72014, avg_loss=0.81120] Step 6 [3.935 sec/step, loss=0.83339, avg_loss=0.81490] Step 7 [3.709 sec/step, loss=0.85086, avg_loss=0.82004] Step 8 [3.466 sec/step, loss=0.84282, avg_loss=0.82288] Step 9 [3.163 sec/step, loss=0.74776, avg_loss=0.81454] Step 10 [3.095 sec/step, loss=0.79225, avg_loss=0.81231] Step 11 [3.074 sec/step, loss=0.83420, avg_loss=0.81430] Step 12 [3.008 sec/step, loss=0.83721, avg_loss=0.81621] Exiting due to exception: Incompatible shapes: [32,1025,80] vs. [32,1000,80]

begeekmyfriend commented 5 years ago

During eval and training, audio length is limited to max_iters outputs_per_step frame_shift_ms milliseconds. With the defaults (max_iters=200, outputs_per_step=5, frame_shift_ms=12.5), this is 12.5 seconds.

peter05010402 commented 5 years ago

@begeekmyfriend Thank you for your quick reply!!!