syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

shape of linear_outputs is not same as while training #49

Open Mihir-Gajera1 opened 2 years ago

Mihir-Gajera1 commented 2 years ago

I have trained model with english data. Training is also converged and it is generating good wav samples from training data at checkpoint time. While evaluating trained model it takes too much time due to below reason and producing noisy output.

linear_outputs = tf.layers.dense(post_outputs, hp.num_freq) # [N, T_out, F] shape of linear_outputs is [1,200000, 1025] (while evaluating) as compared to [1, <200, 1025] while training. which causes griffin_lim to take too much time for generating wav. Can someone please help why this is the case?