When multi-GPU training, my validation map value is very low.

buaazsj commented 4 years ago

❔Question

When multi-GPU training, my validation map value is very low. But when the training is interrupted and then resumed, it becomes normal. Why?

Additional context

glenn-jocher commented 4 years ago

@buaazsj there are a few issues open regarding --resume that talk about this, you might want to search the issues a bit.

akshaychawla commented 4 years ago

Is there any update on this issue? I've been facing the same problem, low mAP when training on 2/4 gpu setting. Training on 1-gpu works perfectly fine.

Edit. when training in a multi-gpu setting, the training loss (gIOU, cls, and obj) are the same as 1-gpu setting. Its only the validation loss + mAP that is reduced.

Edit 2. Ok so this is weird. Evaluation using :func: test.test during training on multi-gpu gives low mAP. BUT, if you separately evaluate using the checkpoint that was saved to disk by running python test.py .... , for the same epoch, you get correct mAP! By correct, I mean mAP is similar to training with 1-gpu setting. I'm testing this on yolov3-tiny.cfg. Will report more when training finishes in 1-2 days.

glenn-jocher commented 4 years ago

@akshaychawla hello, thank you for your interest in our work! This issue seems to lack the minimum requirements for a proper response, or is insufficiently detailed for us to help you. Please note that most technical problems are due to:

Your changes to the default repository. If your issue is not reproducible in a new git clone version of this repository we can not debug it. Before going further run this code and ensure your issue persists:
```
sudo rm -rf yolov5  # remove existing
git clone https://github.com/ultralytics/yolov5 && cd yolov5 # clone latest
python detect.py  # verify detection
# CODE TO REPRODUCE YOUR ISSUE HERE
```
Your custom data. If your issue is not reproducible with COCO or COCO128 data we can not debug it. Visit our Custom Training Tutorial for guidelines on training your custom data. Examine train_batch0.jpg and test_batch0.jpg for a sanity check of training and testing data.
Your environment. If your issue is not reproducible in one of the verified environments below we can not debug it. If you are running YOLOv5 locally, ensure your environment meets all of the requirements.txt dependencies specified below.

If none of these apply to you, we suggest you close this issue and raise a new one using the Bug Report template, providing screenshots and minimum viable code to reproduce your issue. Thank you!

Requirements

Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.6. To install run:

$ pip install -r requirements.txt

Environments

YOLOv5 may be run in any of the following up-to-date verified environments (with all dependencies including CUDA/CUDNN, Python and PyTorch preinstalled):

Google Colab Notebook with free GPU:
Kaggle Notebook with free GPU: https://www.kaggle.com/ultralytics/yolov5
Google Cloud Deep Learning VM. See GCP Quickstart Guide
Docker Image https://hub.docker.com/r/ultralytics/yolov5. See Docker Quickstart Guide

Status

If this badge is green, all YOLOv5 GitHub Actions Continuous Integration (CI) tests are passing. These tests evaluate proper operation of basic YOLOv5 functionality, including training (train.py), testing (test.py), inference (detect.py) and export (export.py) on MacOS, Windows, and Ubuntu.

kongbia commented 4 years ago

Is there any update on this issue? I've been facing the same problem, low mAP when training on 2/4 gpu setting. Training on 1-gpu works perfectly fine.

Edit. when training in a multi-gpu setting, the training loss (gIOU, cls, and obj) are the same as 1-gpu setting. Its only the validation loss + mAP that is reduced.

Edit 2. Ok so this is weird. Evaluation using :func: test.test during training on multi-gpu gives low mAP. BUT, if you separately evaluate using the checkpoint that was saved to disk by running python test.py .... , for the same epoch, you get correct mAP! By correct, I mean mAP is similar to training with 1-gpu setting. I'm testing this on yolov3-tiny.cfg. Will report more when training finishes in 1-2 days.

I met the same question. Have you solved it?

glenn-jocher commented 4 years ago

Ultralytics has open-sourced YOLOv5 at https://github.com/ultralytics/yolov5, featuring faster, lighter and more accurate object detection. YOLOv5 is recommended for all new projects.

** GPU Speed measures end-to-end time per image averaged over 5000 COCO val2017 images using a V100 GPU with batch size 32, and includes image preprocessing, PyTorch FP16 inference, postprocessing and NMS. EfficientDet data from google/automl at batch size 8.

August 13, 2020: v3.0 release: nn.Hardswish() activations, data autodownload, native AMP.
July 23, 2020: v2.0 release: improved model definition, training and mAP.
June 22, 2020: PANet updates: new heads, reduced parameters, improved speed and mAP 364fcfd.
June 19, 2020: FP16 as new default for smaller checkpoints and faster inference d4c6674.
June 9, 2020: CSP updates: improved speed, size, and accuracy (credit to @WongKinYiu for CSP).
May 27, 2020: Public release. YOLOv5 models are SOTA among all known YOLO implementations.
April 1, 2020: Start development of future compound-scaled YOLOv3/YOLOv4-based PyTorch models.

Pretrained Checkpoints

Model	AP^val	AP^test	AP₅₀	Speed_GPU	FPS_GPU		params
YOLOv5s	37.0	37.0	56.2	2.4ms	416	7.5M	13.2B
YOLOv5m	44.3	44.3	63.2	3.4ms	294	21.8M	39.4B
YOLOv5l	47.7	47.7	66.5	4.4ms	227	47.8M	88.1B
YOLOv5x	49.2	49.2	67.7	6.9ms	145	89.0M	166.4B

YOLOv5x + TTA	50.8	50.8	68.9	25.5ms	39	89.0M	354.3B

YOLOv3-SPP	45.6	45.5	65.2	4.5ms	222	63.0M	118.0B

AP^test denotes COCO test-dev2017 server results, all other AP results in the table denote val2017 accuracy.
All AP numbers are for single-model single-scale without ensemble or test-time augmentation. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.001
Speed_GPU measures end-to-end time per image averaged over 5000 COCO val2017 images using a GCP n1-standard-16 instance with one V100 GPU, and includes image preprocessing, PyTorch FP16 image inference at --batch-size 32 --img-size 640, postprocessing and NMS. Average NMS time included in this chart is 1-2ms/img. Reproduce by python test.py --data coco.yaml --img 640 --conf 0.1
All checkpoints are trained to 300 epochs with default settings and hyperparameters (no autoaugmentation). Test Time Augmentation (TTA) runs at 3 image sizes. Reproduce** by python test.py --data coco.yaml --img 832 --augment

For more information and to get started with YOLOv5 please visit https://github.com/ultralytics/yolov5. Thank you!

akshaychawla commented 4 years ago

Here are the results:

issue When training with multiple GPUs on DDP, validation mAP is lower than expected.

observation If we test a serialized version of the ddp model (i.e serialize to disk using torch.save and then serialize using torch.load), the validation mAP is fine.

fix During training instead of running test.test using the model that is being currently trained using DDP, write a temporary checkpoint to disk and tell test to create a new model and load that checkpoint and run validation with it.

Why does it work? No idea. Just dumb luck I guess. If I had to guess, it looks like when we move model.module.yolo_layers to model.yolo_layers after DDP init, and then switch to eval using model.eval(), it may not be switching eval on for the yolo_layers since they're outside model.module. OR, it might have something to do with the test dataloader.

The minor changes required for train.py & test.py can be viewed here: https://github.com/akshaychawla/yolov3/commit/4e5eaab4907e037579f6690296fe3d556c621d4e

Another observation is that when I increased batch-size to 128 and train the model, I was expecting it to perform worse because higher batch size means less gradient updates which means less performance. BUT, this repository has support for gradient accumulation using loss *= batchsize / 64 which I think was initially designed for cases where batchsize < 64. But for bs>64, it has the effect of scaling up the loss, which is same as scaling the learning rate hyp['lr0'] to accommodate higher batch-sizes. But more importantly, it speed up training.

#gpu	Time to 300 epochs (hrs)
1	42.843
2	36.740
4	26.455

Logs, results and checkpoints available at: Edit. https://drive.google.com/drive/folders/11CN50wq-e0COr9RnWPtBHgj8z-AfFBq6?usp=sharing

glenn-jocher commented 4 years ago

@akshaychawla thanks so much for the detailed analysis! It looks like you've put in a lot of work and arrived at a very useful insight.

I would highly recommend you try YOLOv5, it has a multitude of feature additions, improvements and bug fixes above and beyond this YOLOv3 repo. DDP is functioning correctly there, we use for training the largest official YOLOv5x model with no problems. https://github.com/ultralytics/yolov5

akshaychawla commented 4 years ago

@glenn-jocher Thank YOU for building, open-sourcing and then maintaining this and yolov5 repository!

this is a very well written piece of software and I have learnt so much from it. Would love to transition to Yolov5, but reviewer 3 will ask me to compare to "existing state of the art" before a borderline reject, so my arms are tied to v3 for now 🤷‍♂️

goldwater668 commented 4 years ago

@glenn-jocher Yolov3-spp-1 in yolov3 is used cls.cfg After training with yolov5x in yolov5, we get the following results: yolov3 is better than yolov5, why? yolov3:

yolov5:

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

glenn-jocher commented 10 months ago

@goldwater668 It's great to hear about the comparison! YOLOv3 in the YOLOv5 repository features an updated architecture that may outperform YOLOv5 in certain scenarios. YOLOv5, however, offers a significant number of improvements and optimizations over YOLOv3. I would recommend reviewing the recent updates and optimizations in YOLOv5 to ensure an Apple's-to-Apple's comparison. Thank you!

ultralytics / yolov3