Questions related to training on a custom dataset

adamuas commented 6 years ago

Hi,

I got to train a model with a custom dataset, though I have run into an issue where the model will sometimes not make any predictions for a logo - when there is one in an image. I decided to measure this and called it "hit_rate", with a hit being making a prediction regardless if its correct, and on my dataset (test data), its quite low - 13.48% of the images.

At the moment I have just kept it aside to work on other things while I think about what the potential problems could be. I wanted to ask if you have experienced something similar.

My suspicion (ordered by which one I think is most probable):

I might not have enough ground truth boxes for the training (i.e. not much matches), which means it didn't get to train on much data.
Varying image sizes of the wild in the training data means its harder for it to learn to predictions of bounding boxes, and as a result that limitation is affecting the predictions.
Something to do with the architecture needing fine-tuning (though I dont think this is the case as I see the SSDModel doesnt seem to have a problem of objects having varying sizes)

Thanks in advance.

pierluigiferrari commented 6 years ago

I have very limited information about your dataset or how much you trained the model or even which model you're trying to train to begin with, so it could be many things. One possible reason if it's not making a lot of confident predictions is that it simply hasn't been trained enough. I see that all the time when I train a model from scratch: After the first couple of hundred or thousand training steps, the model predicts almost nothing with high confidence (except background), so after confidence thresholding, you're left with no predictions at all. Then it starts getting better and better, first occasionally making a correct detection here and there on easy objects, then slowly detecting harder objects.

As for your own conjectures:

I don't know how much data you have or how many ground truth boxes are in an average image, but this is likely not the reason. If you have only little data, if anything it should overfit and predict those logos with high confidence.
I'm not sure I understand what you mean by "Varying image sizes of the wild". Are we talking about about varying image sizes or varying object sizes? The images all have the same size after they come out of the generator anyway, and the varying object sizes that partially result from resizing the images shouldn't be a problem for the model as long as they are within the range of object sizes that the configuration (scaling factors etc.) was designed for. Of course it's known that SSD generally tends to have trouble with very small objects.
I don't know which model you're using, but for a custom dataset it's always worth tuning parameters like the scaling factors and aspect ratios, or even the network architecture if necessary, to the reality in your dataset. If your objects have similar sizes and shapes as the objects in one of the pre-trained models, then that model's configuration should work fine as is of course.

adamuas commented 6 years ago

Sorry for the late response.

I am training the SSD300 model with 13 classes with roughly at least 150 classes per class for training (the rest is my testing set - i.e. at least 50 images per class) . I setup an early stopping with a patience of 100 and min_delta of 0.001 to avoid it stopping too early. Because I had limited training data, I used the training data + noise as my validation data (noise - was introduced by the image augmentations).

VGG16BASE_FREEZE = ['input_1', 'conv1_1', 'conv1_2', 'pool1',
          'conv2_1', 'conv2_2', 'pool2',
          'conv3_1', 'conv3_2', 'conv3_3', 'pool3',
          'conv4_1', 'conv4_2', 'conv4_3', 'pool4',
          'conv5_1', 'conv5_2', 'conv5_3', 'pool5']

How many epochs do you recon I should train for from your experience?

pierluigiferrari commented 6 years ago

One suggestion would be that you load weights of one of the fully trained SSD300 models rather than starting to train with the trained VGG16 weights only. Read in my first reply to #50 on how to circumvent the problem that the number of classes for your dataset (13) differs from the number of classes of the trained models (20 for Pascal VOC, 80 for MS COCO, or 200 for ImageNet).

I don't know what your logo images look like, but I assume they are very different from any of the object categories in Pascal VOC, MS COCO, or ImageNet. Nonetheless, it's probably fair to assume that any trained weights are always a better starting point to fine-tune the model on your dataset than randomly initialized weights, even if your objects of interest are very different from the objects the models were trained on. Loading trained model weights would likely improve your results tremendously and save you a lot of training time.

It's hard to say for how many training steps (let's use training steps as the metric rather than epochs) you would have to train for until you get half-decent results if you start out with only the VGG16 weights, but my best guess would be in the ball park of a few tens of thousands.

But once again, I would recommend to start out by fine-tuning one of the fully trained models. Sub-sampling the weight tensors of the classification predictor layers sounds more tedious than it is, at the end of the day it's just a bit of Numpy slicing. Or just changing the names of the classification predictor layers would be the really easy (and slightly worse) way.

adamuas commented 6 years ago

Thanks, appreciate this!

I will give it a try with the ImageNet weights as a starting point.

pierluigiferrari commented 6 years ago

Yeah, the ImageNet weights will probably be a good starting point. I've created a notebook that does the weight sub-sampling for you:

https://github.com/pierluigiferrari/ssd_keras/blob/master/weight_sampling_tutorial.ipynb

adamuas commented 6 years ago

Thanks alot @pierluigiferrari , appreciate this :+1: :+1:

pierluigiferrari / ssd_keras

Questions related to training on a custom dataset #49