About the training speed

miracle-fmh commented 4 years ago

Helo @glenn-jocher , Thanks for your contribution. I have 2 questions:

By using the "python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64 --device 0" You said taht "The training time is not 2 days with V100GPU". In my experiment, for 1 epoch, it will take 15mins, so for training 300 epoch, it will take 3.125 days. So, How do you train the yolov5s network?
By using multi-gpus, for example 2 gpus "python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64 --device 0,1" The training speed is not 2x faster then training with 1gpu card. Why?

github-actions[bot] commented 4 years ago

Hello @miracle-fmh, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.

If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.

If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:

Cloud-based AI systems operating on hundreds of HD video streams in realtime.
Edge AI integrated into custom iOS and Android apps for realtime 30 FPS video inference.
Custom data training, hyperparameter evolution, and model exportation to any destination.

For more information please visit https://www.ultralytics.com.

NanoCode012 commented 4 years ago

Hello, can you tell me the speed difference between running single vs multiple for yours? I also see performance and speed drop when using multiple GPUs

glenn-jocher commented 4 years ago

@miracle-fmh --notest for fastest training. 2-3 days for 5s is normal.

glenn-jocher commented 4 years ago

@miracle-fmh do not use bug labels for questions.

miracle-fmh commented 4 years ago

Hello, can you tell me the speed difference between running single vs multiple for yours? I also see performance and speed drop when using multiple GPUs

2gpus, 13min/epoch, 1gpu 17min/epcoh

NanoCode012 commented 4 years ago

2gpus, 13min/epoch, 1gpu 17min/epcoh

Hmm, I'm not sure why I don't get the same performance. I ran yolov5s on default coco128 for 100 epoch on 1 and 2 gpu.

1 gpu: 0.175h
2 gpu: 0.203h

Also, the accuracy for 1 gpu is a bit higher. Did I run it for too few epochs?

How about memory consumption? Could it be that you benefit from it because you used a large dataset?

glenn-jocher commented 4 years ago

@NanoCode012 for speed you really want to test a larger dataset, i.e. coco. You set your batch size to use up all your memory for 1 GPU, then for 2 GPUs you increase your batch size to again use up the available memory, typically the 2 GPU batch size will be about 2x the 1 GPU batch size. Much of the speedup is because of the larger batch sizes.

coco128 will not fully capture multigpu benefits because the dataset is simply too small.

NanoCode012 commented 4 years ago

@glenn-jocher , thanks. I will finally take this chance to download the full coco dataset. I would like to know if a larger batch size would decrease performance. I read from a comment in darknet yolov4 that going from batch size 8 to 16 wouldn't do anything, but increasing it beyond would decrease performance. Would it apply here as well?

Because coco128 is very small, it was very easy to fill all the image in one batch, decreasing its performance.

coco128 will not fully capture multigpu benefits because the dataset is simply too small.

This might explain why my tests on multiprocess ddp isn't seeing any result.

glenn-jocher commented 4 years ago

@nominal batch size is 64. If you specify anything smaller than that the code accumulates until 64 images have been processed before optimizer update. If you specify higher it updates every batch. Performance will vary, you can see the batch sizes used for COCO in https://github.com/ultralytics/yolov5#reproduce-our-training

github-actions[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

ultralytics / yolov5

About the training speed #282