Closed iuria21 closed 7 years ago
@imanoluria The cost function should be able to deal with variable sequence length. One defines max_seq_len, and then pads with a sequence with zero vectors as dummy data to make all the sequence lengths the same. This is done to feed batched data to gpu.
The loss of dummy data is also calculated . You can use categorical cross entropy to calculate cost. But you should still mask out (make zero) the part of the cost that resulted form the dummy data.
You can play around with this code to understand what is going on. Run different component of the graph, print the result and see for yourself.
import tensorflow as tf
in_dim = 3
nsteps = 10
num_sen = 5
# Data
y_pred = np.array([[[1 for i in range(in_dim)] for j in range(nsteps)] for k in range(num_sen)])
seq_len = np.array([i+1 for i in range(num_sen)])
y_true = np.array([[[i+j+10*k for i in range(in_dim)] for j in range(nsteps)] for k in range(num_sen)])#
y_true_for_mask = np.array([[[(1 if j < seq_len[k] else 0) for i in range(in_dim)] for j in range(nsteps)] for k in range(num_sen)])#
def variable_length_cost_with_target_as_mask():
tf.reset_default_graph()
with tf.Session() as sess:
_input_shape = (None, nsteps, in_dim)
_output_shape = (None, nsteps, in_dim)
g_y_pred = tf.placeholder(tf.float32, _input_shape, name='y_pred')
g_y_true = tf.placeholder(tf.float32, _input_shape, name='y_true')
g_seq_len = tf.placeholder(tf.int32, None, name='seq_len')
g_y_true_for_mask = tf.placeholder(tf.float32, _input_shape, name='y_true_test')
entropy = tf.nn.softmax_cross_entropy_with_logits(labels=g_y_true, logits=g_y_pred)
mask = tf.sign(tf.reduce_max(tf.abs(g_y_true_for_mask), reduction_indices=2))
entropy *= mask
entropy = tf.reduce_sum(entropy, reduction_indices = 1)
entropy = entropy / tf.cast(g_seq_len, tf.float32)
cost = sess.run(entropy, feed_dict={g_y_true: y_true,
g_y_pred: y_pred,
g_y_true_for_mask: y_true_for_mask,
g_seq_len: seq_len})
print("cost:", cost )
And this is how I used the tensorflow provided cost function and performed masking:
# Cross Entropy Loss
def _loss(self):
# Loss
self.l2 = self.config.l2_lambda * sum(tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables())
length_mask = tf.sign(tf.reduce_max(tf.abs(self.output), reduction_indices=2))
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=self.output, logits=self.logits)
cross_entropy *= length_mask
cross_entopy = tf.reduce_sum(cross_entropy, reduction_indices = 1) / tf.cast(self.seq_len, tf.float32)
self.loss = tf.reduce_mean(cross_entropy) + self.l2
Very helpful! Thanks!
Hi, thanks for your great work! I have a question about the cost function: Why do you define a cost function instead of the "categorical_crossentropy"? and, what is the difference between both of them?