monikkinom / ner-lstm

Named Entity Recognition using multilayered bidirectional LSTM
540 stars 182 forks source link

Cost function #21

Closed iuria21 closed 7 years ago

iuria21 commented 7 years ago

Hi, thanks for your great work! I have a question about the cost function: Why do you define a cost function instead of the "categorical_crossentropy"? and, what is the difference between both of them?

utkrist commented 7 years ago

@imanoluria The cost function should be able to deal with variable sequence length. One defines max_seq_len, and then pads with a sequence with zero vectors as dummy data to make all the sequence lengths the same. This is done to feed batched data to gpu.

The loss of dummy data is also calculated . You can use categorical cross entropy to calculate cost. But you should still mask out (make zero) the part of the cost that resulted form the dummy data.

You can play around with this code to understand what is going on. Run different component of the graph, print the result and see for yourself.

import tensorflow as tf in_dim = 3
nsteps = 10 num_sen = 5

# Data
y_pred = np.array([[[1 for i in range(in_dim)] for j in range(nsteps)] for k in range(num_sen)])    
seq_len = np.array([i+1 for i in range(num_sen)])
y_true = np.array([[[i+j+10*k for i in range(in_dim)] for j in range(nsteps)] for k in range(num_sen)])#
y_true_for_mask = np.array([[[(1 if j < seq_len[k] else 0) for i in range(in_dim)] for j in range(nsteps)] for k in range(num_sen)])#

def variable_length_cost_with_target_as_mask():
     tf.reset_default_graph()

    with tf.Session() as sess:
        _input_shape  = (None, nsteps, in_dim)
        _output_shape = (None, nsteps, in_dim)

        g_y_pred   = tf.placeholder(tf.float32, _input_shape, name='y_pred')
        g_y_true   = tf.placeholder(tf.float32, _input_shape, name='y_true')
        g_seq_len  = tf.placeholder(tf.int32, None, name='seq_len')
        g_y_true_for_mask = tf.placeholder(tf.float32, _input_shape, name='y_true_test')

        entropy = tf.nn.softmax_cross_entropy_with_logits(labels=g_y_true, logits=g_y_pred)
        mask = tf.sign(tf.reduce_max(tf.abs(g_y_true_for_mask), reduction_indices=2))

        entropy *= mask
        entropy = tf.reduce_sum(entropy, reduction_indices = 1)

        entropy = entropy / tf.cast(g_seq_len, tf.float32)
        cost   = sess.run(entropy, feed_dict={g_y_true: y_true, 
                                          g_y_pred: y_pred,
                                          g_y_true_for_mask: y_true_for_mask,
                                          g_seq_len: seq_len})

        print("cost:", cost )

And this is how I used the tensorflow provided cost function and performed masking:

# Cross Entropy Loss
    def _loss(self):
        # Loss
        self.l2 = self.config.l2_lambda * sum(tf.nn.l2_loss(tf_var) for tf_var in tf.trainable_variables())
        length_mask   = tf.sign(tf.reduce_max(tf.abs(self.output), reduction_indices=2))
        cross_entropy = tf.nn.softmax_cross_entropy_with_logits(labels=self.output, logits=self.logits)
        cross_entropy *= length_mask
        cross_entopy = tf.reduce_sum(cross_entropy, reduction_indices = 1) / tf.cast(self.seq_len, tf.float32)
        self.loss = tf.reduce_mean(cross_entropy) + self.l2
iuria21 commented 7 years ago

Very helpful! Thanks!