Closed miloabolaffio closed 3 years ago
Hello @miloabolaffio, thank you for your interest in 🚀 YOLOv5! Please visit our ⭐️ Tutorials to get started, where you can find quickstart guides for simple tasks like Custom Data Training all the way to advanced concepts like Hyperparameter Evolution.
If this is a 🐛 Bug Report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom training ❓ Question, please provide as much information as possible, including dataset images, training logs, screenshots, and a public link to online W&B logging if available.
For business inquiries or professional support requests please visit https://www.ultralytics.com or email Glenn Jocher at glenn.jocher@ultralytics.com.
Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7
. To install run:
$ pip install -r requirements.txt
YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):
If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are currently passing. CI tests verify correct operation of YOLOv5 training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu every 24 hours and on every commit.
@miloabolaffio the main difference you are not accounting for is that the testloader operates with rect=True, whereas the trainloader operates with rect=False and mosaic enabled by default.
You should review your train and test jpgs in your logging directory to analyze differences, as these jpgs are literally what is being fed into the model. This is all explained in our Custom Training Tutorial, I would suggest you start there: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data
@miloabolaffio saw you used square images. There are a still a number of unaccounted for differences. The model is in .eval() mode when running inference in test.py and detect.py, and in .train() mode when training, which affects BatchNorm2d() layer execution among other things.
Also batch-size 2 is not recommended as this will lead to insufficient statistics for computing normalizations, resulting in reduced mAP.
Thanks a lot for your reply! We had already looked to the rect flag and to the jpgs ( completely equal ). Good point on the differences between train and validation for the batch normalization! Do you suggest in order to test overfitting to:
We used batch-size 2 only to test if the network was able to overfit during normal training we used batch-size 16
@miloabolaffio testing should already uses the same batch size as training. https://github.com/ultralytics/yolov5/blob/7d629fde05c11b87d97fd937a2b66c99ec8d6865/train.py#L333-L343
But you are talking about two different things. Overfitting is not the same as producing identical train and val losses. Any model can overfit any data given enough training.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
:grey_question:Question
We have a very difficult class in our dataset and during training the network usually finish the training with 80% map on the other classes and 0 on this one. In order to make some tests we decided to try to overfit on only 2 images that contains just this difficult class, using them both in training and in validation. The image size are 1024x1024 and we are using no data augmentation for both the train and test dataloader (augment=False). We are using as input size 1024x1024, batch size 2, pretrained weights. Although all this conditions that should allow for exact loss matching between training and validation we found:
Can you kindly give us an idea on why it happens or where we have to look?
A part the difference between the train e validation loss do you have any suggestion on how to handle the situation of a particular difficult class to predict in the contest of yolov5?
We have already:
We are planning to:
Additional context
Here down are the plots of a training we made. command:
python3 train.py --img 1024 --batch 2 --epochs 5000 --data overfit.yaml --weights weights/yolov5s.pt
We also tried to use --sync-bn (we are training on a couple of 1080 GTX) obtaining always disparities between train e validation loss.