Overfiting problem for custom data

TanmoyDL commented 6 years ago

I have trained the SSD7 for my own custom data (same format as udacity dataset including number of same number of class) where prediction results are horrible. The training data size is 8K and validation data size is 3K. The validation loss is increasing i.e. the model is overfitted. The same data has been used into google object detection api (tensorflow object detection api) and final output is good. From the figure I can clearly state that it is over fitting. How do I tackle this issue? figure_1-2

adamuas commented 6 years ago

For SSD7, which is a much smaller network, I think it will be hard for it to overfit the training data of 8K - especially given the number of epochs (unless you have a lot of training steps per epoch). Did you train the network from scratch or start with pre-trained SSD weights? When you say validation set - are you referring to the test on your graph or the validation data that is used during training? You could have a noisy version of your training set as a validation set or if you can spare some data you could partition your training data to get a validation data to use to validate the model after each epoch (i.e. the validation data also passed to the training function. You could experiment with early stopping on your validation loss. Keras has a callback for this https://keras.io/callbacks/.

pierluigiferrari commented 6 years ago

Are you sure your loss diagram above is correct? The plotted graphs seem unrealistic. It can't possibly be that a model overfits right from the start, unless the training and validation datasets have pretty much nothing in common. Something is wrong here, likely something about the dataset or the way the annotations were processed.

If you know that your training is successful using the TensorFlow object detection API, that's a good starting point to solve your problem. So what's different between the two models and training processes? Specifically,

Did you use the exact same dataset with the exact same annotations in both cases?
Did you use similar/identical data augmentation in both cases? If not, what's different?
What model architecture did you use with the TensorFlow object detection API? Pre-trained or from scratch?
Did you use any form of regularization (besides batch normalization) such as L2-regularization or dropout when you trained using the TensorFlow object detection API? If yes, did you use the same regularization when you tried to train SSD7?
Are you using this implementation with Python 2 or Python 3?

TanmoyDL commented 6 years ago

Sorry. I have made a mistake while ploting the history. But that was not an issue for my custom data. Yes it's a small network. The same data has been applied on google object detection api and it's working fine whereas the SSD keras model could not detect the object. I have not used any pretrained weight. I am sharing the prediction output. Although, the trained model epoch was 50. The prediction image was not part of validation set but it is the same size. output

pierluigiferrari commented 6 years ago

See my questions in the comment above. Also, can you share the correct loss history?

TanmoyDL commented 6 years ago

Hi @pierluigiferrari , Thank you for your kind reply. One of my teammate has followed the link https://pythonprogramming.net/custom-objects-tracking-tensorflow-object-detection-api-tutorial/?completed=/video-tensorflow-object-detection-api-tutorial/ to train own custom data. He got the results while testing. I am sharing the zip file of training and prediction codes. Now I have been training the model with 100 epochs(batch size=32, l2_regularization=0.0005). Please help me if I do any mistake in the code.

train and prediction code.tar.gz

pierluigiferrari commented 6 years ago

Two things I noticed in the code you attached:

Note that OpenCV loads images in BGR format, but if you used BatchGenerator with swap_channels=False (which you did according to your code, because that's the default argument value), then your model was trained on RGB images. This cannot be the only reason why your model is performing poorly, but the wrong channel order will reduce the model performance significantly. I would recommend loading images with scipy.misc.imread() instead of OpenCV.
You are using your own code to draw the predicted boxes onto the image. Are you sure you're drawing the boxes correctly? Because there is in fact a truck in that image and there is a second object (a car) in the image, the predicted boxes are just not in the right place. The confidences seem alright (0.73 is pretty high), it's just the box coordinates that are off. This is a strong indicator that you might have made a mistake with the coordinates somewhere along the way, because it is very unlikely that the model did successfully learn to predict that there is a truck in that image while at the same time being completely off about where that truck is. You might have drawn the predicted box coordinates incorrectly, or scaled the image incorrectly. Or you might have made a mistake somewhere in the box coordinate format in the input data and the model learned incorrect localization.

I recommend you train the model again, ideally using the ssd7_training.ipynb notebook and making as few changes as possible to the code, except of course setting the paths to the dataset etc. and setting the Keras callbacks during training as you need them. Make sure that the input coordinate format of the labels is correct during training, otherwise the model will learn only nonsense for localization. Let me know what you get.

TanmoyDL commented 6 years ago

@pierluigiferrari , Thank you for your kind suggestions. I have made a huge mistake while training the model. I have used 300, 480 images as same as Udacity data but annotations have been done on 600, 480 images. If the SSD keras model input is not doing any normalization on annotations data then it will provide huge error. That's why predicting bounding box is not accurate. I am currently generating a new dataset (300,480) that will be used to train the model. Let you know the results after training the model on new dataset.

pierluigiferrari commented 6 years ago

Glas to hear you (probably) found the problem!

vinodrajendran001 commented 6 years ago

@TanmoyDL and @pierluigiferrari I don't understand, the SSD7 code should work for varying image size right?

I am also not getting good results. My dataset involves each image of different size and annotations are done accordingly.

As @TanmoyDL said, do I need to resize all the image and their corresponding annotations to same size before training?

pierluigiferrari commented 6 years ago

@vinodrajendran001 yes, all images need to be of the same size for training. There are many ways to achieve this, e.g. random cropping, random padding, just plain resizing, etc. BatchGenerator offers all of the aforementioned options. And of course it converts the annotations accordingly.

There are multiple reasons why all images need to have the same size:

The whole concept of efficient tensor operations depends on the fact that tensors are homogenous in type and size along each dimension. It is impossible for a TensorFlow or Theano tensor to contain a batch of images of varying sizes.
The SSD architecture divides the image into a grid of cells via the anchor boxes (actually, multiple grids of different resolution). Predictions are then made relative to these anchor boxes. The relative positions of the center points of these anchor boxes in the image obviously depend on the aspect ratio of the image. Similarly, the number of anchor boxes per predictor layer depends on the size of the image. If the sizes of the input images would not be all identical, then the anchor box grid would no longer be constant and prediction against it would be impossible. For that reason alone, training on images of varying sizes cannot work. The SSD model architecture is inherently dependent on the size of the input, as is the case with pretty much any efficient object detection architecture.
The receptive fields of all neurons in the network are independent of the input image size. This means the neurons in a given convolutional layer always see the same number of pixels of an image, regardless of the size of the image. Just for illustration, consider an object of interest in an image (e.g. a dog) that fits exactly inside the receptive field of some layer in the network. If you now double the size of the image in both spatial dimensions, then that same layer will only see a quarter of the object. The dog now has four times as many pixels, but the convolutional layer can still only see the same number of pixels as before, i.e. its receptive field didn't change. It is clear that the weights that any given convolutional layer learns depend on the size of the input images, and this means in turn that a given trained CNN cannot have the same prediction quality on images that differ in size from the images it has been trained on.

So training on images of varying sizes doesn't make sense and cannot work because of technical (TensorFlow) issues, because it doesn't make sense with the SSD model architecture, and, most importantly, because it doesn't make sense with the general concept of convolutional neural networks.

TanmoyDL commented 6 years ago

Hi @pierluigiferrari, Thanks for your kind reply. I have trained the SSD7 model with 20 epochs. Now the model can track the objects. I am sharing the loss functions and object detection results. figure_300_480 output I have changed the BGR format to RGB while testing by model.predict(). Although, there is some errors on plotting function in Cv2. This is not an issue.

I have a few set of questions regarding the code,

How many number of epochs do I need during training? Although, I have used only 20 epochs. I know the model trained by 20 epochs is not sufficient. The model performance over the validation dataset is not decreasing. Is it the because of data types or epochs? Whereas the model training error is decreasing.
How do I calculate the scales = [0.08, 0.16, 0.32, 0.64, 0.96] that you have used in SSD7 model? How does it related to anchor boxes? Again thank you for the above suggestions. I am waiting for your valuable suggestions.

pierluigiferrari commented 6 years ago

It makes sense to keep training as long as the validation loss keeps decreasing. Once the validation loss has stopped increasing for a few epochs in a row (maybe 5 or 10), I would stop training. Then use the weights of that epoch where the validation loss was the lowest as your final weights.
The scaling factors represent fractions of the total image size. They determine the size of the anchor boxes for the respective predictor layers, i.e. they determine the size of the objects that each predictor layer will learn to predict. Theoretically you could force any given predictor layer to try to learn to predict objects of any size, but in order to obtain good results it only really makes sense to choose the scaling factors such that the resulting anchor boxes lie entirely within the receptive fields of the respective predictor layers. I recommend you to read the paper and/or the calculations in the code of keras_layer_AnchorBoxes.py to understand exactly how the scaling factors relate to the sizes of the anchor boxes.

About your overfitting problem:

Is your validation dataset very different from your training dataset, or are they images from the same environment?
Are the images in your training dataset diverse or are they relatively similar?

Some suggestions:

Try to use more aggressive data augmentation (brightness, scaling, translation, flips).
Try to increase the L2 regularization rate.

adamuas commented 6 years ago

@TanmoyDL Looking good :+1: Its impressive getting that with just 20 epochs of training. @pierluigiferrari I wanted to ask with regards to multiple bounding boxes on a object. I understand that NMS (Non Maximum Suppression) is supposed to handle this issue, but my question is how about if even after 500 epochs of training, it is unable to suppress redundant bounding boxes for object like in my case and the picture above around the leftmost car. What is the most likely issue?

My second question is regarding the probability of it detecting objects it was trained on from images. What is the usual case for the model when its fully trained? (i..e probability of predicting a class regardless of if its right or wrong).

pierluigiferrari commented 6 years ago

@adamuas

Multiple bounding boxes for the same object is not a matter of how well the model is trained, it's an inherent property of the SSD architecture, i.e. regardless of how well your model is trained, very often there will be more than one bounding box with high confidence for the same object instance, that's why the output needs to be post-processed with NMS. As for the NMS settings, there is always a trade-off: If you decrease the iou_threshold for NMS, then that will reduce the number of duplicate predictions for the same object (which is good), but at the same time, the lower you set the iou_threshold, the less can two objects that are very close to each other or overlapping be distinguished (which is bad).
I don't quite understand the second question. Are you asking what the probability for false positives is for a given trained model? There is no general answer to that.

pierluigiferrari commented 6 years ago

@TanmoyDL I've just trained SSD7 on the Udacity road traffic dataset for 30 epochs to make sure that there is no bug somewhere in the code. I split the dataset into 18,000 training images and 4,241 validation images. Here is the result:

ssd7_loss_history

The validation loss decreases roughly at the same pace as the training loss, as expected. The prediction performance on the validation dataset was quite alright even after 15 epochs already. I can't reproduce the issues you're having, but I have updated the ssd7_training.ipynb notebook, so you can check out the exact configuration I used there. I've now also included a train/val split for the dataset, so you should be able to reproduce my results exactly.

adamuas commented 6 years ago

@pierluigiferrari Thanks again for your response, really appreciate it. And also the weight sampling tutorial. It steered me in the direction of the problem. I was mainly asking about the "hit rate" issue (the model not making localizing or classifying a lot of test image) I asked about in the other thread, only rephrased this time. I think I have an idea of whats going on. I believe I will have to clean up my custom dataset as I fear the bounding boxes one of the annotators drew really large boxes (stumbled on one while inspecting the data of something that the annotator probably thought was related, but not really the class, and others bounding multiple instances of a class within a huge box). I will start off by subsampling the weights though, then move on to the dataset. Suspect this is probably causing not a lot of consistency for classes and as a result, the model is not learning to localize many on many of the images. Do you also reckon this is likely the case, especially dealing with a custom dataset?

pierluigiferrari commented 6 years ago

@adamuas It depends on how much bad data there is in your training dataset. The effect of bad training data on the learning process is somewhat proportional to the ratio of that bad data in the training dataset. For example, the Udacity road traffic dataset contains a bunch of bad data: There are images where the annotators literally drew boxes onto bushes or in the sky and labeled them "car". But the ratio of the bad data to the overall dataset is quite small, so the model still learns relatively well. But if 30 percent of your data is bad data, then I would expect the model to learn pretty much nothing. So it really depends on what fraction of your dataset is low-quality.

Instances like the one you mentioned where multiple instances of the same object class were labeled with one big bounding box around all of them definitely qualify as bad data. If things like that happen a lot, then that should definitely be cleaned up.

That being said, high-quality data really is the foundation of everything. All hyper parameter optimization and data augmentation is pointless if the input data is low-quality.

pierluigiferrari commented 6 years ago

Closing this for now since there seems to be no activity from the OP and half of all the comments are unrelated questions (that should better have become separate issues).

jnvipul commented 6 years ago

@TanmoyDL How do you check validation loss using tensorflow object detection api? I can see the training loss but not validation loss while running the training and eval jobs. What am I missing?

Thanks

wuchichung commented 6 years ago

you guys save my proj

pierluigiferrari / ssd_keras

Overfiting problem for custom data #59