Closed xus-stack closed 5 years ago
sorry, I got it. the size of stop_token_target is [T_out,] apparently, I was misguided by the notation in line 82. the size of stop_token_outputs should be [N, T_out, 1] this issue is closed.
But another difference concerns me:
In @begeekmyfriend'fork datafeeder.py :
line 117:
stop_token_target = np.asarray([0.] * len(mel_target))
In tacotron 2 Tacotron-2/tacotron/feeder.py:
line 194:
token_target = np.asarray([0.] * (len(mel_target) - 1))
This difference is notable.
might be a problem here?
I believe 1 should be added to the last frame of the target
I see, that's a little different from Taco 2, but taken care in feeder.py
In datafeeder.py : line 117:
stop_token_target = np.asarray([0.] * len(mel_target))
Apparently the shape of stop_token_target here is [M, ]. It is [N, M] in the batch, maybe?But in tacotron.py : line 82:
stop_token_outputs = tf.reshape(stop_token_outputs, [batch_size, -1]) # **[N, T_out, M]**
line 116~118:self.stop_token_loss = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits( labels=self.stop_token_targets, logits=self.stop_token_outputs))
In rnn_wrappers.py : Apparently, the stop_tokenoutput is a scalar for each decoder step. Isn't the output shaped [N, T_out, 1]?How do the dimension of target and output of stop token match in the CODE. Can somebody explain this? THX! @begeekmyfriend