iitzco / faced

🚀 😏 Near Real Time CPU Face detection using deep learning
MIT License
548 stars 150 forks source link

How to train on new generic object class #3

Open arkeys opened 6 years ago

arkeys commented 6 years ago

How was faced trained on the the WIDER FACE dataset and how was the dataset prepared?

iitzco commented 6 years ago

Hi @arkeys

I will try to upload all training scripts soon. However, let me explain briefly how I trained faced:

1°) The WIDERFACE dataset was transformed to tfrecord format using this repo.

2°) For each example (that consisted in an image with several [x0, x1, y0, y1] bounding boxes, I prepared the following:

This preparation steps are similar on those YOLO make. You can read more about these steps here

3°) Horizontal flips to all examples where done to augment the dataset.

4°) For each forward pass, a 9x9x5 tensor was obtained (after a series of convolutions/pooling/dropout/BN layers):

5°) Training was done using Adam optimized with a fixed learning rate of 5e-5.

For more information about the architecture, you can read my recent post here.

You should be able to repeat this training process with any object class you wish to perform object detection with.

Hope this helps. Feel free to continue this discussion if you have any doubt.

qindj commented 6 years ago

thx!, need training scripts for learning +1

AtaaEddin commented 5 years ago

Hi @iitzco Thank you for sharing the code, I just want to ask you about the last fully connected layer in the Auxiliary network, it has five cells corresponding to (P,x,y,w,h) right?

can you explain more about this network because i want to do fine tuning starting from your model.

adellelin commented 5 years ago

Hi,

Thanks for writing this up. Can this model also be used to return features? Or does this only apply to recognize faces and return a bounding box.

Thanks!

mpottinger commented 5 years ago

I too am interested in training this. It is just what we need for custom object detection. Generic multi-class object detectors like Yolo are great for object detection competitions, but in the real world i care about detecting one or a few custom objects and doing it fast on a CPU. Stuff like this is needed to make Yolo type algorithms more than a toy and into a real tool.

arishin commented 5 years ago

Hi @iitzco, if there is any additional scripting you might have to setup training, would appreciate it. Thanks for the example - looking forward to training on other "few" object sets for fun (for me I'm looking for hands and heads...)

p30arena commented 5 years ago

@iitzco A little help here? training process "FREEZES" when using dropouts what should I do?

I also use an arbitrary cls to store one class

adam = tf.keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=0.0)
model.compile(loss=custom_loss, optimizer=adam, metrics=['accuracy'])
i = tf.keras.layers.Input(shape=(img_shape, img_shape, 3))

x = tf.keras.layers.Conv2D(8, (9, 9), padding='same')(i)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(8, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)

x = tf.keras.layers.Conv2D(16, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(16, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)

x = tf.keras.layers.Conv2D(32, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(32, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)

x = tf.keras.layers.Conv2D(64, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(64, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)

x = tf.keras.layers.Conv2D(128, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(128, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)

cx1 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx1 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx1)
cx1 = tf.keras.layers.BatchNormalization()(cx1)
cx1 = tf.keras.layers.Dropout(0.5)(cx1)
cx1 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx1)
cx1 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx1)
cx1 = tf.keras.layers.BatchNormalization()(cx1)
cx1 = tf.keras.layers.Dropout(0.5)(cx1)

cx2 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx2 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx2)
cx2 = tf.keras.layers.BatchNormalization()(cx2)
cx2 = tf.keras.layers.Dropout(0.5)(cx2)
cx2 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx2)
cx2 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx2)
cx2 = tf.keras.layers.BatchNormalization()(cx2)
cx2 = tf.keras.layers.Dropout(0.5)(cx2)

cx3 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx3 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx3)
cx3 = tf.keras.layers.BatchNormalization()(cx3)
cx3 = tf.keras.layers.Dropout(0.5)(cx3)
cx3 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx3)
cx3 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx3)
cx3 = tf.keras.layers.BatchNormalization()(cx3)
cx3 = tf.keras.layers.Dropout(0.5)(cx3)

cx4 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx4 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx4)
cx4 = tf.keras.layers.BatchNormalization()(cx4)
cx4 = tf.keras.layers.Dropout(0.5)(cx4)
cx4 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx4)
cx4 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx4)
cx4 = tf.keras.layers.BatchNormalization()(cx4)
cx4 = tf.keras.layers.Dropout(0.5)(cx4)

cls = tf.keras.layers.Conv2D(1, (1, 1), padding='same')(cx1)
cls = tf.keras.backend.squeeze(cls, -1)
cls = tf.keras.activations.sigmoid(cls)

prob = tf.keras.layers.Conv2D(1, (1, 1), padding='same')(cx2)
prob = tf.keras.backend.squeeze(prob, -1)
prob = tf.keras.activations.sigmoid(prob)

xy_center = tf.keras.layers.Conv2D(2, (1, 1), padding='same')(cx3)
x_center = tf.keras.activations.sigmoid(xy_center[..., 0])
y_center = tf.keras.activations.sigmoid(xy_center[..., 1])

wh = tf.keras.layers.Conv2D(2, (1, 1), padding='same')(cx4)
w = tf.keras.activations.sigmoid(wh[..., 0])
h = tf.keras.activations.sigmoid(wh[..., 1])

x = tf.keras.layers.concatenate([cls, y_center, y_center, w, h, prob])
x = tf.keras.layers.Reshape((grid_shape * grid_shape, (nb_boxes * 5 + 1)))(x)
@tf.function
def custom_loss(y_true, y_pred):
  K = tf.keras.backend
  global grid_tensor
  if grid_tensor is None:
    grid_tensor = K.variable(grid)

  # # make sure all values are positive ?
  # y_pred = K.abs(y_pred)
  # y_true = K.abs(y_true)

  y_true_class = y_true[..., 0:1]
  y_pred_class = y_pred[..., 0:1]

  pred_boxes = K.reshape(y_pred[..., 1:], (-1, grid_shape * grid_shape, nb_boxes, 5))
  true_boxes = K.reshape(y_true[..., 1:], (-1, grid_shape * grid_shape, nb_boxes, 5))

  y_pred_xy = pred_boxes[..., 0:2] + grid_tensor
  y_pred_wh = pred_boxes[..., 2:4]
  y_pred_conf = pred_boxes[..., 4]

  y_true_xy = true_boxes[..., 0:2]
  y_true_wh = true_boxes[..., 2:4]
  y_true_conf = true_boxes[..., 4]

  clss_loss = K.sum(K.square(y_true_class - y_pred_class), axis=-1)
  xy_loss = K.sum(K.sum(K.square(y_true_xy - y_pred_xy), axis=-1) * y_true_conf, axis=-1)
  wh_loss = K.sum(K.sum(K.square(K.sqrt(y_true_wh) - K.sqrt(y_pred_wh)), axis=-1) * y_true_conf, axis=-1)

  # when we add the confidence the box prediction lower in quality but we gain the estimation of the quality of the box
  # however the training is a bit unstable

  # minimum distance between boxes distance between the two center
  intersect_wh = K.maximum(K.zeros_like(y_pred_wh), (y_pred_wh + y_true_wh) / 2 - K.abs(y_pred_xy - y_true_xy))
  intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
  true_area = y_true_wh[..., 0] * y_true_wh[..., 1]
  pred_area = y_pred_wh[..., 0] * y_pred_wh[..., 1]
  union_area = pred_area + true_area - intersect_area
  iou = intersect_area / union_area

  conf_loss = K.sum(K.square(y_true_conf * iou - y_pred_conf) * y_true_conf, axis=-1)

  d = clss_loss + xy_loss + wh_loss + conf_loss

  return d