sdanaipat / fairseq-translator

A quick Tensorflow implementation of Facebook FairSeq[1] for character-level neural machine translation (EN -> JP).
14 stars 5 forks source link

labels = jp_batch[:, 1:] why labels start from 1 ? #1

Open azzhu opened 6 years ago

azzhu commented 6 years ago

def init_loss(hg, jp_batch, device): with tf.variable_scope("logits", reuse=device > 0), tf.device("/gpu:%d" % device): logits = slim.fully_connected(hg, jp_vocab_size) ## ? logits = logits[:, :-1] pred = tf.nn.softmax(logits) logits_shape = tf.shape(logits) logits = tf.reshape(logits, [logits_shape[0] * logits_shape[1], jp_vocab_size]) labels = jp_batch[:, 1:] labels = tf.reshape(labels, [-1,]) loss_mask = labels > 0 logits = tf.boolean_mask(logits, loss_mask) labels = tf.boolean_mask(labels, loss_mask)

    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=logits)
    loss = tf.reduce_mean(loss)
    tf.summary.scalar('softmax_loss', loss);
    return loss

Among them: labels = jp_batch[:, 1:] why labels start from 1 but not 0 ?

sdanaipat commented 6 years ago

jp_batch's shape is (batch_size, number_of_timesteps), for example, <go>, ね, こ, ..., <eos>. Note that character at time step jp_batch[:, i] is an input given to the decoder for predicting next character jp_batch[:, i+1], that is at time step 0, given <go>, the decoder predicts at time step 1, given , the decoder predicts ... at time step t-1, given ..., the decoder predicts <eos>.

Therefore, the target labels is jp_batch itself one step ahead. The target labels that we are trying to predict here is jp_labels left-shifted by one time step, i.e., labels = jp_batch[:, 1:]

azzhu commented 6 years ago

thanks!