tensorflow / models

Models and examples built with TensorFlow
Other
77.24k stars 45.75k forks source link

when i change the [slim] train_image_classifier to a person re-id model,i got very big loss #1238

Closed andongchen closed 7 years ago

andongchen commented 7 years ago

https://github.com/tensorflow/models/tree/master/slim when i change the [slim] train_image_classifier to a person re-id model,i finetune this model from resnet_v1_50 checkpoint 。i got very big loss I changed code is like this for a person re-id model

  images, labels = tf.train.batch(
      [image, label],
      batch_size=FLAGS.batch_size*2,
      num_threads=FLAGS.num_preprocessing_threads,
      capacity=5 * FLAGS.batch_size)
  images1 = tf.slice(images, [0, 0, 0, 0], [FLAGS.batch_size, -1, -1, -1])
  images2 = tf.slice(images, [FLAGS.batch_size, 0, 0, 0], [-1, -1, -1 ,-1])
  labels1 = tf.slice(labels, [0], [FLAGS.batch_size])
  labels2 = tf.slice(labels, [FLAGS.batch_size], [-1])
  labels_y = tf.cast(tf.equal(labels1,labels2,), tf.float32)
  labels_y_ = labels_y-1
  labels_y_2 = tf.concat([labels_y,labels_y_],0)
  labels_y_2 = tf.reshape(labels_y_2,[FLAGS.batch_size,2])
  labels1 = slim.one_hot_encoding(
      labels1, dataset.num_classes - FLAGS.labels_offset)
  labels2 = slim.one_hot_encoding(
      labels2, dataset.num_classes - FLAGS.labels_offset)
  batch_queue1 = slim.prefetch_queue.prefetch_queue(
      [images1, labels1], capacity=2 * deploy_config.num_clones)
  batch_queue2 = slim.prefetch_queue.prefetch_queue(
      [images2, labels2], capacity=2 * deploy_config.num_clones)

####################
# Define the model #
####################
def clone_fn(batch_queue1,batch_queue2,labels_y_2):
  """Allows data parallelism by creating multiple clones of network_fn."""
  images1, labels1 = batch_queue1.dequeue()
  images2, labels2 = batch_queue2.dequeue()
  # with tf.variable_scope('siamese') as scope:
  logits1, end_points1 = network_fn(images1)
  # scope.reuse_variables()
  logits2, end_points2 = network_fn(images2)

  l1_flat = tf.squeeze(end_points1['pool5'], [1, 2], name='SpatialSqueeze')
  l2_flat = tf.squeeze(end_points2['pool5'], [1, 2], name='SpatialSqueeze')
  eucd2 = tf.pow(tf.subtract(l1_flat, l2_flat), 2)
  with tf.name_scope('siamese'):
      weights_siamese = tf.Variable(tf.random_normal([2048,2],stddev=0.1),name='weights_siamese')
      biase_siamese = tf.Variable(tf.constant([0.1],shape=[2]), name='biase_siamese')
      y = tf.matmul(eucd2,weights_siamese) + biase_siamese

  #############################
  # Specify the loss function #
  #############################
  tf.losses.softmax_cross_entropy(
       logits=logits1,onehot_labels=labels1,
      label_smoothing=FLAGS.label_smoothing, weights=0.5)
  tf.losses.softmax_cross_entropy(
       logits=logits2,onehot_labels=labels2,
      label_smoothing=FLAGS.label_smoothing, weights=0.5)
  tf.losses.softmax_cross_entropy(logits=y, onehot_labels=labels_y_2 ,weights=1.0)

  return end_points1,end_points2

loss like this INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global step 10: loss = -0.7855 (0.11 sec/step) INFO:tensorflow:global step 20: loss = 9.6894 (0.12 sec/step) INFO:tensorflow:global step 30: loss = 260.1677 (0.10 sec/step) INFO:tensorflow:global step 40: loss = 75.9224 (0.12 sec/step) INFO:tensorflow:global step 50: loss = -155.9554 (0.12 sec/step) INFO:tensorflow:global step 60: loss = -290.5832 (0.10 sec/step) INFO:tensorflow:global step 70: loss = 1907.8824 (0.11 sec/step) INFO:tensorflow:global step 80: loss = 3305.5435 (0.11 sec/step) INFO:tensorflow:global step 90: loss = 5179.0605 (0.12 sec/step) INFO:tensorflow:global step 100: loss = 26149.2090 (0.18 sec/step) INFO:tensorflow:global step 110: loss = 7243.3086 (0.14 sec/step) INFO:tensorflow:global step 120: loss = 20228.7734 (0.12 sec/step) INFO:tensorflow:global step 130: loss = -33802.4336 (0.12 sec/step) INFO:tensorflow:global step 140: loss = -16943.0703 (0.11 sec/step) INFO:tensorflow:global step 150: loss = 13684.9248 (0.10 sec/step) INFO:tensorflow:global step 160: loss = 66518.9297 (0.10 sec/step) INFO:tensorflow:global step 170: loss = -24630.4824 (0.11 sec/step) INFO:tensorflow:global step 180: loss = 56567.5039 (0.12 sec/step) INFO:tensorflow:global step 190: loss = 12862.8721 (0.11 sec/step) INFO:tensorflow:global step 200: loss = 44358.7070 (0.10 sec/step) INFO:tensorflow:global step 210: loss = 1338.1443 (0.11 sec/step) INFO:tensorflow:global step 220: loss = 54169.8281 (0.11 sec/step) INFO:tensorflow:global step 230: loss = 6509.2778 (0.11 sec/step) INFO:tensorflow:global step 240: loss = 34684.2070 (0.12 sec/step) INFO:tensorflow:global step 250: loss = 99308.3984 (0.11 sec/step) INFO:tensorflow:global step 260: loss = 101687.7422 (0.11 sec/step) INFO:tensorflow:global step 270: loss = -333855.3750 (0.10 sec/step) INFO:tensorflow:global step 280: loss = -34892.8320 (0.10 sec/step) INFO:tensorflow:global step 290: loss = -303499.2188 (0.11 sec/step) INFO:tensorflow:global step 300: loss = -273107.8750 (0.10 sec/step) INFO:tensorflow:global step 310: loss = 269577.0625 (0.11 sec/step) INFO:tensorflow:global step 320: loss = 2474.6311 (0.11 sec/step) INFO:tensorflow:global step 330: loss = -19816.9062 (0.11 sec/step) INFO:tensorflow:global step 340: loss = -889736.6250 (0.11 sec/step) INFO:tensorflow:global step 350: loss = 370747.0312 (0.12 sec/step) INFO:tensorflow:global step 360: loss = 5192.4868 (0.12 sec/step) INFO:tensorflow:global step 370: loss = 54307.8164 (0.12 sec/step) INFO:tensorflow:global step 380: loss = 68392.2969 (0.11 sec/step) INFO:tensorflow:global step 390: loss = 645659.6250 (0.11 sec/step) INFO:tensorflow:global step 400: loss = 1106760.7500 (0.13 sec/step)

andongchen commented 7 years ago

but when i use softmax for siamese logit loss is like this.the loss will descend to very small ,but siame loss will descend to negtive when step is very big is it need softmax for logit to tf.losses.softmax_cross_entropy

def clone_fn(batch_queue1,batch_queue2,labels_y_2):
      """Allows data parallelism by creating multiple clones of network_fn."""
      images1, labels1 = batch_queue1.dequeue()
      images2, labels2 = batch_queue2.dequeue()
      # with tf.variable_scope('siamese') as scope:
      logits1, end_points1 = network_fn(images1)
      # scope.reuse_variables()
      logits2, end_points2 = network_fn(images2)

      l1_flat = tf.squeeze(end_points1['pool5'], [1, 2], name='SpatialSqueeze')
      l2_flat = tf.squeeze(end_points2['pool5'], [1, 2], name='SpatialSqueeze')
      eucd2 = tf.pow(tf.subtract(l1_flat, l2_flat), 2)
      with tf.name_scope('siamese'):
          weights_siamese = tf.Variable(tf.random_normal([2048,2],stddev=0.1),name='weights_siamese')
          biase_siamese = tf.Variable(tf.constant([0.1],shape=[2]), name='biase_siamese')
          y = tf.matmul(eucd2,weights_siamese) + biase_siamese
          y_ = tf.nn.softmax(y)

      #############################
      # Specify the loss function #
      #############################
      tf.losses.softmax_cross_entropy(
           logits=logits1,onehot_labels=labels1,
          label_smoothing=FLAGS.label_smoothing, weights=0.5)
      tf.losses.softmax_cross_entropy(
           logits=logits2,onehot_labels=labels2,
          label_smoothing=FLAGS.label_smoothing, weights=0.5)
      tf.losses.softmax_cross_entropy(logits=y_, onehot_labels=labels_y_2 ,weights=1.0)

      return end_points1,end_points2

INFO:tensorflow:Starting Session. INFO:tensorflow:Starting Queues. INFO:tensorflow:global_step/sec: 0 INFO:tensorflow:global step 10: loss = 6.0470 (0.10 sec/step) INFO:tensorflow:global step 20: loss = 4.7737 (0.10 sec/step) INFO:tensorflow:global step 30: loss = 4.5574 (0.11 sec/step) INFO:tensorflow:global step 40: loss = 3.6830 (0.10 sec/step) INFO:tensorflow:global step 50: loss = 5.1997 (0.11 sec/step) INFO:tensorflow:global step 60: loss = 1.6952 (0.10 sec/step) INFO:tensorflow:global step 70: loss = 3.1358 (0.10 sec/step) INFO:tensorflow:global step 80: loss = 2.8293 (0.10 sec/step) INFO:tensorflow:global step 90: loss = 7.7483 (0.10 sec/step) INFO:tensorflow:global step 100: loss = 6.2354 (0.10 sec/step) INFO:tensorflow:global step 110: loss = 7.8672 (0.11 sec/step) INFO:tensorflow:global step 120: loss = 6.3874 (0.10 sec/step) INFO:tensorflow:global step 130: loss = 5.1675 (0.11 sec/step) INFO:tensorflow:global step 140: loss = 4.4336 (0.10 sec/step) INFO:tensorflow:global step 150: loss = 3.4140 (0.10 sec/step) INFO:tensorflow:global step 160: loss = 5.9614 (0.11 sec/step) INFO:tensorflow:global step 170: loss = 0.3056 (0.10 sec/step) INFO:tensorflow:global step 180: loss = 5.9729 (0.10 sec/step) INFO:tensorflow:global step 190: loss = 1.8424 (0.11 sec/step) INFO:tensorflow:global step 200: loss = 6.5091 (0.12 sec/step) INFO:tensorflow:global step 210: loss = 6.3883 (0.11 sec/step) INFO:tensorflow:global step 220: loss = 3.3903 (0.11 sec/step) INFO:tensorflow:global step 230: loss = 4.4493 (0.11 sec/step) INFO:tensorflow:global step 240: loss = 2.9825 (0.12 sec/step) INFO:tensorflow:global step 250: loss = 5.5893 (0.14 sec/step) INFO:tensorflow:global step 260: loss = 3.1772 (0.10 sec/step) INFO:tensorflow:global step 270: loss = 10.9485 (0.14 sec/step) INFO:tensorflow:global step 280: loss = 4.6115 (0.13 sec/step) INFO:tensorflow:global step 290: loss = 5.2641 (0.14 sec/step) INFO:tensorflow:global step 300: loss = 5.3500 (0.13 sec/step) INFO:tensorflow:global step 310: loss = 5.0483 (0.11 sec/step) INFO:tensorflow:global step 320: loss = 5.6563 (0.15 sec/step) INFO:tensorflow:global step 330: loss = 5.3334 (0.12 sec/step) INFO:tensorflow:global step 340: loss = 9.4017 (0.14 sec/step) INFO:tensorflow:global step 350: loss = 6.4664 (0.13 sec/step) INFO:tensorflow:global step 360: loss = 4.5586 (0.15 sec/step) INFO:tensorflow:global step 370: loss = 6.3450 (0.11 sec/step) INFO:tensorflow:global step 380: loss = 13.3093 (0.14 sec/step) INFO:tensorflow:global step 390: loss = 4.7924 (0.14 sec/step) INFO:tensorflow:global step 400: loss = 6.3047 (0.12 sec/step) INFO:tensorflow:global step 410: loss = 0.9263 (0.12 sec/step) INFO:tensorflow:global step 420: loss = 2.1498 (0.12 sec/step) INFO:tensorflow:global step 430: loss = 7.2051 (0.11 sec/step) INFO:tensorflow:global step 440: loss = 3.4443 (0.14 sec/step) INFO:tensorflow:global step 450: loss = 3.3235 (0.12 sec/step) INFO:tensorflow:global_step/sec: 7.73834 INFO:tensorflow:global step 460: loss = 6.9202 (0.13 sec/step) INFO:tensorflow:global step 470: loss = 8.3436 (0.11 sec/step) INFO:tensorflow:global step 480: loss = 3.9146 (0.10 sec/step)

andongchen commented 7 years ago

sorry,it's my mistake! I got the error reality one hot label!

TaihuLight commented 7 years ago

Cloud you share your code with me?

andongchen commented 7 years ago

of course! give me your mail

zouhongwei commented 7 years ago

Cloud you share your code with me?My mail is zouhongwei@hust.edu.cn, Thank you!