Unespected differences between the train e validation loss during an overfitting temptative

miloabolaffio commented 3 years ago

:grey_question:Question

We have a very difficult class in our dataset and during training the network usually finish the training with 80% map on the other classes and 0 on this one. In order to make some tests we decided to try to overfit on only 2 images that contains just this difficult class, using them both in training and in validation. The image size are 1024x1024 and we are using no data augmentation for both the train and test dataloader (augment=False). We are using as input size 1024x1024, batch size 2, pretrained weights. Although all this conditions that should allow for exact loss matching between training and validation we found:

a strong difference in the box loss between the training and validation: in training it arrives near zero while in val remain above 0.02
a minor difference in the objectness loss between the training and validation

Can you kindly give us an idea on why it happens or where we have to look?

A part the difference between the train e validation loss do you have any suggestion on how to handle the situation of a particular difficult class to predict in the contest of yolov5?

We have already:

collecting more examples for this class
homogenizing the labeling that was not optimal for this class.

We are planning to:

make an evolve
use image_weights
try different batch sizes

Additional context

Here down are the plots of a training we made. results command:

python3 train.py --img 1024 --batch 2 --epochs 5000 --data overfit.yaml --weights weights/yolov5s.pt

We also tried to use --sync-bn (we are training on a couple of 1080 GTX) obtaining always disparities between train e validation loss.

github-actions[bot] commented 3 years ago

Hello @miloabolaffio, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.

If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.

For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/models/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.

glenn-jocher commented 3 years ago

@miloabolaffio the main difference you are not accounting for is that the testloader operates with rect=True, whereas the trainloader operates with rect=False and mosaic enabled by default.

You should review your train and test jpgs in your logging directory to analyze differences, as these jpgs are literally what is being fed into the model. This is all explained in our Custom Training Tutorial, I would suggest you start there: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data

glenn-jocher commented 3 years ago

@miloabolaffio saw you used square images. There are a still a number of unaccounted for differences. The model is in .eval() mode when running inference in test.py and detect.py, and in .train() mode when training, which affects BatchNorm2d() layer execution among other things.

Also batch-size 2 is not recommended as this will lead to insufficient statistics for computing normalizations, resulting in reduced mAP.

miloabolaffio commented 3 years ago

Thanks a lot for your reply! We had already looked to the rect flag and to the jpgs ( completely equal ). Good point on the differences between train and validation for the batch normalization! Do you suggest in order to test overfitting to:

set equal number of batch size between train and validation.
set to .train() mode during inference phase. Or do you have any other suggestion?

We used batch-size 2 only to test if the network was able to overfit during normal training we used batch-size 16

glenn-jocher commented 3 years ago

@miloabolaffio testing should already uses the same batch size as training. https://github.com/ultralytics/yolov5/blob/7d629fde05c11b87d97fd937a2b66c99ec8d6865/train.py#L333-L343

But you are talking about two different things. Overfitting is not the same as producing identical train and val losses. Any model can overfit any data given enough training.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ultralytics / yolov5