Closed miracle-fmh closed 4 years ago
Hello @miracle-fmh, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
Hello, can you tell me the speed difference between running single vs multiple for yours? I also see performance and speed drop when using multiple GPUs
@miracle-fmh --notest for fastest training. 2-3 days for 5s is normal.
@miracle-fmh do not use bug labels for questions.
Hello, can you tell me the speed difference between running single vs multiple for yours? I also see performance and speed drop when using multiple GPUs
2gpus, 13min/epoch, 1gpu 17min/epcoh
2gpus, 13min/epoch, 1gpu 17min/epcoh
Hmm, I'm not sure why I don't get the same performance. I ran yolov5s on default coco128 for 100 epoch on 1 and 2 gpu.
1 gpu: 0.175h
2 gpu: 0.203h
Also, the accuracy for 1 gpu is a bit higher. Did I run it for too few epochs?
How about memory consumption? Could it be that you benefit from it because you used a large dataset?
@NanoCode012 for speed you really want to test a larger dataset, i.e. coco. You set your batch size to use up all your memory for 1 GPU, then for 2 GPUs you increase your batch size to again use up the available memory, typically the 2 GPU batch size will be about 2x the 1 GPU batch size. Much of the speedup is because of the larger batch sizes.
coco128 will not fully capture multigpu benefits because the dataset is simply too small.
@glenn-jocher , thanks. I will finally take this chance to download the full coco dataset. I would like to know if a larger batch size would decrease performance. I read from a comment in darknet yolov4 that going from batch size 8 to 16 wouldn't do anything, but increasing it beyond would decrease performance. Would it apply here as well?
Because coco128 is very small, it was very easy to fill all the image in one batch, decreasing its performance.
coco128 will not fully capture multigpu benefits because the dataset is simply too small.
This might explain why my tests on multiprocess ddp isn't seeing any result.
@nominal batch size is 64. If you specify anything smaller than that the code accumulates until 64 images have been processed before optimizer update. If you specify higher it updates every batch. Performance will vary, you can see the batch sizes used for COCO in https://github.com/ultralytics/yolov5#reproduce-our-training
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Helo @glenn-jocher , Thanks for your contribution. I have 2 questions:
By using the "python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64 --device 0" You said taht "The training time is not 2 days with V100GPU". In my experiment, for 1 epoch, it will take 15mins, so for training 300 epoch, it will take 3.125 days. So, How do you train the yolov5s network?
By using multi-gpus, for example 2 gpus "python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 64 --device 0,1" The training speed is not 2x faster then training with 1gpu card. Why?