Modifying SSD to support multiple labels per bounding box output?

neil454 commented 7 years ago

Hello, thank you so much for this Keras implementation of SSD!

I have successfully ported the caffe weights for SSD trained on the COCO dataset (300x300 input, 80+1 classes), and now I'm trying to utilize these weights to help retrain SSD on my specific problem.

I need SSD to output 200-some attributes instead of 81 object classes, and since one object can have multiple attributes, I need SSD to output class scores that don't sum to 1.

So I tried just re-training without any major changes (had to randomly initialize the weights of 6 layers that relied on COCO's 81 class output, but loaded the rest just fine), and my training loss was stuck at around 200.

I then realized this would never train because the class score outputs are normalized to sum to 1, so I changed the Activation function on the last layer of SSD from "softmax" to "sigmoid" (maybe I should use "tanh" instead?), and I'm currently training successfully I think, but I won't know for a while. The loss started at 32 and is now down to 7 after 8000 samples, and still decreasing nicely.

Anyways, I was just wondering about SSD's custom loss function, since I see it uses a softmax loss for conf_loss, in ssd_training.py. Should I change this to some other loss function? If so, which one?

rykov8 commented 7 years ago

@neil454 sorry for late reply. I believe, that you need to change the last activation to sigmoid (you have already done it) and use binary_crossentropy as your loss function for conf_loss and average it along classes, some example code (I don't guarantee that it works) for the method in class MultiboxLoss (mainly taken from Keras binary crossentropy loss):

def _multilabel_loss(self, y_true, y_pred):
        """Compute multilabel loss.
        # Arguments
            y_true: Ground truth targets,
                tensor of shape (?, num_boxes, num_classes).
            y_pred: Predicted logits,
                tensor of shape (?, num_boxes, num_classes).
        # Returns
            multilabel_loss: Multilabel loss, tensor of shape (?, num_boxes).
        """
        # this is just to be sure not to compute log(0)
        y_pred = tf.clip_by_value(y_pred, 1e-15, 1 - 1e-15)
        # convert to logits
        logits = tf.log(y_pred / (1 - y_pred))
        # binary crossentropy
        multilabel_loss = tf.nn.sigmoid_cross_entropy_with_logits(labels=y_true,
                                                                  logits=logits)
        # average along classes (not sure, probably, tf crossentropy above do this itself)
        multilabel_loss = tf.reduce_mean(multilabel_loss, reduction_indices=-1)
        return multilabel_loss

neil454 commented 7 years ago

Thanks for the reply!

I tried that code for conf_loss, but I ended up with a model that returned way too many detections. I'd expect most of these boxes to have a high confidence on the background class, but that's rarely the case (I haven't seen background confidence above 0.7 for any of the boxes).

Have you or anyone successfully even trained on regular VOC2007 or COCO using just the base VGG pre-trained weights? I just tried this, and I couldn't get good results (usually could only detect persons at 0.6-0.7 confidence, rarely any other classes). I tried to follow the SSD paper's training scheme as well.

Therefore, I don't know if the problem lies with my unique problem (data & custom loss) or the training code itself.

rykov8 commented 7 years ago

@neil454 sorry for late reply again. I haven't trained on regular VOC2007, but on my dataset training works quite well. I believe, that in the repo there is a bug in generator in random_sized_crop method, unfortunately, I have no time to fix it know, so, if you use random_sized_crop, you may try switch it off.

oarriaga commented 7 years ago

@neil454 Hello Neil I have successfully trained VOC2007 unfortunately I didn't program any metrics; therefore I couldn't explicitly compare with the original SSD paper. However, I did disable random_sized_crop. The best validation loss I got was of 2.10 at iteration 09.

neil454 commented 7 years ago

@oarriaga Hello and thanks for the info.

I've always had do_crop=False, so that wouldn't be an issue.

When you trained on VOC2007, I'm assuming you didn't train completely from scratch, so how did you load the base VGG network weights for the first part of SSD?

Since I'm just trying to verify that this works, I just took the keras weights that @rykov8 provided and loaded just the VGG portion, instead of converting the actual VGG weights.

Also, what value did you use for neg_pos_ratio in MultiboxLoss? If I use 2.0 or 3.0, I end up with a model that rarely detects anything but person (too many negatives). In fact, I just tried lowering this to 1.0, and now I get a model that can detect most objects, but there are usually several bboxes, and many false positives, although I'm not done training this model.

Did you ever encounter any of these issues when training on VOC2007? In general, how does your trained model performance compare to the provided weights (ported directly from caffe)?

rykov8 / ssd_keras

Modifying SSD to support multiple labels per bounding box output? #32