Open arkeys opened 6 years ago
Hi @arkeys
I will try to upload all training scripts soon. However, let me explain briefly how I trained faced
:
1°) The WIDERFACE dataset was transformed to tfrecord
format using this repo.
2°) For each example (that consisted in an image with several [x0, x1, y0, y1] bounding boxes, I prepared the following:
object_matrix
: a 9x9 matrix containing 1 if the cell contained the center point of a face, 0 otherwise.center_x_matrix
: a 9x9 matrix containing the cell's relative value of the x center of a face in those cells containing the center of a face. 0 otherwise. For example, if the center of a face is in {x: 0.5, y: 0.5} (normalized to the image size), then the center_x_matrix
would contain a 0.5 in the element center_x_matrix[4][4]
. Recall that all values in this matrix are relative to the cell.center_y_matrix
: a 9x9 matrix containing the cell's relative value of the y center of a face in those cells containing the center of a face. 0 otherwise. w_matrix
: a 9x9 matrix containing the normalized width value of a face in those cells containing the center of a face. 0 otherwise. h_matrix
: a 9x9 matrix containing the normalized height value of a face in those cells containing the center of a face. 0 otherwise. This preparation steps are similar on those YOLO make. You can read more about these steps here
3°) Horizontal flips to all examples where done to augment the dataset.
4°) For each forward pass, a 9x9x5 tensor was obtained (after a series of convolutions/pooling/dropout/BN layers):
object_matrix
. This is loss1
.sigmoid
(to normalized it) and a L2 loss is computed with the center_x_matrix
and the center_y_matrix
. All the cell values that are 0 because they do not contain a face are ignored in the L2 loss. This is loss2
.sigmoid
(to normalized it) and a L2 loss is computed with the w_matrix
and the h_matrix
. All the cell values that are 0 because they do not contain a face are ignored in the L2 loss. This is loss3
.loss1 + loss2 + loss3
5°) Training was done using Adam optimized with a fixed learning rate of 5e-5.
For more information about the architecture, you can read my recent post here.
You should be able to repeat this training process with any object class you wish to perform object detection with.
Hope this helps. Feel free to continue this discussion if you have any doubt.
thx!, need training scripts for learning +1
Hi @iitzco Thank you for sharing the code, I just want to ask you about the last fully connected layer in the Auxiliary network, it has five cells corresponding to (P,x,y,w,h) right?
can you explain more about this network because i want to do fine tuning starting from your model.
Hi,
Thanks for writing this up. Can this model also be used to return features? Or does this only apply to recognize faces and return a bounding box.
Thanks!
I too am interested in training this. It is just what we need for custom object detection. Generic multi-class object detectors like Yolo are great for object detection competitions, but in the real world i care about detecting one or a few custom objects and doing it fast on a CPU. Stuff like this is needed to make Yolo type algorithms more than a toy and into a real tool.
Hi @iitzco, if there is any additional scripting you might have to setup training, would appreciate it. Thanks for the example - looking forward to training on other "few" object sets for fun (for me I'm looking for hands and heads...)
@iitzco A little help here? training process "FREEZES" when using dropouts what should I do?
I also use an arbitrary cls to store one class
adam = tf.keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999, decay=0.0)
model.compile(loss=custom_loss, optimizer=adam, metrics=['accuracy'])
i = tf.keras.layers.Input(shape=(img_shape, img_shape, 3))
x = tf.keras.layers.Conv2D(8, (9, 9), padding='same')(i)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(8, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Conv2D(16, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(16, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Conv2D(32, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(32, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Conv2D(64, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(64, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)
x = tf.keras.layers.Conv2D(128, (9, 9), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.Conv2D(128, (1, 1), padding='same')(x)
x = tf.keras.layers.LeakyReLU(alpha=0.3)(x)
x = tf.keras.layers.BatchNormalization()(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2, 2), padding='valid')(x)
x = tf.keras.layers.Dropout(0.5)(x)
cx1 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx1 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx1)
cx1 = tf.keras.layers.BatchNormalization()(cx1)
cx1 = tf.keras.layers.Dropout(0.5)(cx1)
cx1 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx1)
cx1 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx1)
cx1 = tf.keras.layers.BatchNormalization()(cx1)
cx1 = tf.keras.layers.Dropout(0.5)(cx1)
cx2 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx2 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx2)
cx2 = tf.keras.layers.BatchNormalization()(cx2)
cx2 = tf.keras.layers.Dropout(0.5)(cx2)
cx2 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx2)
cx2 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx2)
cx2 = tf.keras.layers.BatchNormalization()(cx2)
cx2 = tf.keras.layers.Dropout(0.5)(cx2)
cx3 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx3 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx3)
cx3 = tf.keras.layers.BatchNormalization()(cx3)
cx3 = tf.keras.layers.Dropout(0.5)(cx3)
cx3 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx3)
cx3 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx3)
cx3 = tf.keras.layers.BatchNormalization()(cx3)
cx3 = tf.keras.layers.Dropout(0.5)(cx3)
cx4 = tf.keras.layers.Conv2D(192, (9, 9), padding='same')(x)
cx4 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx4)
cx4 = tf.keras.layers.BatchNormalization()(cx4)
cx4 = tf.keras.layers.Dropout(0.5)(cx4)
cx4 = tf.keras.layers.Conv2D(192, (1, 1), padding='same')(cx4)
cx4 = tf.keras.layers.LeakyReLU(alpha=0.3)(cx4)
cx4 = tf.keras.layers.BatchNormalization()(cx4)
cx4 = tf.keras.layers.Dropout(0.5)(cx4)
cls = tf.keras.layers.Conv2D(1, (1, 1), padding='same')(cx1)
cls = tf.keras.backend.squeeze(cls, -1)
cls = tf.keras.activations.sigmoid(cls)
prob = tf.keras.layers.Conv2D(1, (1, 1), padding='same')(cx2)
prob = tf.keras.backend.squeeze(prob, -1)
prob = tf.keras.activations.sigmoid(prob)
xy_center = tf.keras.layers.Conv2D(2, (1, 1), padding='same')(cx3)
x_center = tf.keras.activations.sigmoid(xy_center[..., 0])
y_center = tf.keras.activations.sigmoid(xy_center[..., 1])
wh = tf.keras.layers.Conv2D(2, (1, 1), padding='same')(cx4)
w = tf.keras.activations.sigmoid(wh[..., 0])
h = tf.keras.activations.sigmoid(wh[..., 1])
x = tf.keras.layers.concatenate([cls, y_center, y_center, w, h, prob])
x = tf.keras.layers.Reshape((grid_shape * grid_shape, (nb_boxes * 5 + 1)))(x)
@tf.function
def custom_loss(y_true, y_pred):
K = tf.keras.backend
global grid_tensor
if grid_tensor is None:
grid_tensor = K.variable(grid)
# # make sure all values are positive ?
# y_pred = K.abs(y_pred)
# y_true = K.abs(y_true)
y_true_class = y_true[..., 0:1]
y_pred_class = y_pred[..., 0:1]
pred_boxes = K.reshape(y_pred[..., 1:], (-1, grid_shape * grid_shape, nb_boxes, 5))
true_boxes = K.reshape(y_true[..., 1:], (-1, grid_shape * grid_shape, nb_boxes, 5))
y_pred_xy = pred_boxes[..., 0:2] + grid_tensor
y_pred_wh = pred_boxes[..., 2:4]
y_pred_conf = pred_boxes[..., 4]
y_true_xy = true_boxes[..., 0:2]
y_true_wh = true_boxes[..., 2:4]
y_true_conf = true_boxes[..., 4]
clss_loss = K.sum(K.square(y_true_class - y_pred_class), axis=-1)
xy_loss = K.sum(K.sum(K.square(y_true_xy - y_pred_xy), axis=-1) * y_true_conf, axis=-1)
wh_loss = K.sum(K.sum(K.square(K.sqrt(y_true_wh) - K.sqrt(y_pred_wh)), axis=-1) * y_true_conf, axis=-1)
# when we add the confidence the box prediction lower in quality but we gain the estimation of the quality of the box
# however the training is a bit unstable
# minimum distance between boxes distance between the two center
intersect_wh = K.maximum(K.zeros_like(y_pred_wh), (y_pred_wh + y_true_wh) / 2 - K.abs(y_pred_xy - y_true_xy))
intersect_area = intersect_wh[..., 0] * intersect_wh[..., 1]
true_area = y_true_wh[..., 0] * y_true_wh[..., 1]
pred_area = y_pred_wh[..., 0] * y_pred_wh[..., 1]
union_area = pred_area + true_area - intersect_area
iou = intersect_area / union_area
conf_loss = K.sum(K.square(y_true_conf * iou - y_pred_conf) * y_true_conf, axis=-1)
d = clss_loss + xy_loss + wh_loss + conf_loss
return d
How was faced trained on the the WIDER FACE dataset and how was the dataset prepared?