Open azzhu opened 6 years ago
jp_batch
's shape is (batch_size, number_of_timesteps)
, for example, <go>, ね, こ, ..., <eos>
. Note that character at time step jp_batch[:, i]
is an input given to the decoder for predicting next character jp_batch[:, i+1]
, that is
at time step 0, given <go>
, the decoder predicts ね
at time step 1, given ね
, the decoder predicts こ
...
at time step t-1, given ...
, the decoder predicts <eos>
.
Therefore, the target labels
is jp_batch
itself one step ahead. The target labels that we are trying to predict here is jp_labels
left-shifted by one time step, i.e., labels = jp_batch[:, 1:]
thanks!
def init_loss(hg, jp_batch, device): with tf.variable_scope("logits", reuse=device > 0), tf.device("/gpu:%d" % device): logits = slim.fully_connected(hg, jp_vocab_size) ## ? logits = logits[:, :-1] pred = tf.nn.softmax(logits) logits_shape = tf.shape(logits) logits = tf.reshape(logits, [logits_shape[0] * logits_shape[1], jp_vocab_size]) labels = jp_batch[:, 1:] labels = tf.reshape(labels, [-1,]) loss_mask = labels > 0 logits = tf.boolean_mask(logits, loss_mask) labels = tf.boolean_mask(labels, loss_mask)
Among them: labels = jp_batch[:, 1:] why labels start from 1 but not 0 ?