Closed HaxThePlanet closed 4 years ago
Hello @HaxThePlanet, thank you for your interest in our work! Please visit our Custom Training Tutorial to get started, and see our Jupyter Notebook , Docker Image, and Google Cloud Quickstart Guide for example environments.
If this is a bug report, please provide screenshots and minimum viable code to reproduce your issue, otherwise we can not help you.
If this is a custom model or data training question, please note that Ultralytics does not provide free personal support. As a leader in vision ML and AI, we do offer professional consulting, from simple expert advice up to delivery of fully customized, end-to-end production solutions for our clients, such as:
For more information please visit https://www.ultralytics.com.
@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:
python train.py # will use ALL available cuda resources found on system
python train.py --device 0,1 # specify devices
python train.py --device 0 # specify 1 device
python train.py --device cpu # force cpu usage
test.py works exactly the same way. detect.py accepts a --device
argument, but is limited to 1 gpu.
Excellent, thanks for the fast response and hard work. This thing is amazing!
when I type the command: python train.py --data coco.yaml --cfg yolov5s.yaml --weights '' --batch-size 16 then, it will show below: {'lr0': 0.01, 'momentum': 0.937, 'weight_decay': 0.0005, 'giou': 0.05, 'cls': 0.58, 'cls_pw': 1.0, 'obj': 1.0, 'obj_pw': 1.0, 'iou_t': 0.2, 'anchor_t': 4.0, 'fl_gamma': 0.0, 'hsv_h': 0.014, 'hsv_s': 0.68, 'hsv_v': 0.36, 'degrees': 0.0, 'translate': 0.0, 'scale': 0.5, 'shear': 0.0} Namespace(adam=False, batch_size=16, bucket='', cache_images=False, cfg='./models/yolov5s.yaml', data='./data/coco.yaml', device='', epochs=300, evolve=False, img_size=[640, 640], multi_scale=False, name='', nosave=False, notest=False, rect=False, resume=False, single_cls=False, weights='') Using CUDA Apex device0 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device1 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device2 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device3 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device4 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device5 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device6 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) device7 _CudaDeviceProperties(name='GeForce RTX 2080 Ti', total_memory=11019MB) Optimizer groups: 54 .bias, 60 conv.weight, 51 other
bug report as below:
/share/home/xx/anaconda3/envs/pt1.5.0/lib/python3.7/site-packages/torch/nn/parallel/distributed.py:303: UserWarning: Single-Process Multi-GPU is not the recommended mode for DDP. In this mode, each DDP instance operates on multiple devices and creates multiple module replicas within one process. The overhead of scatter/gather and GIL contention in every forward pass can slow down training. Please consider using one DDP instance per device or per module replica by explicitly setting device_ids or CUDA_VISIBLE_DEVICES. NB: There is a known issue in nn.parallel.replicate that prevents a single DDP instance to operate on multiple model replicas.
"Single-Process Multi-GPU is not the recommended mode for "
Traceback (most recent call last):
File "train.py", line 400, in
@AIFAN-Lab thanks for the bug report. I tested on two GPUs today and everything worked well. Can you try to reproduce this in our docker image to see if it's an environment issue?
Ok. I will test the Docker. And report later.
Is it still necessary to train the first 1000 or so iterations on a single GPU?
@HaxThePlanet that's never been necessary.
@HaxThePlanet good news: yolov5 supports multi-gpu out of the box. Some examples:
python train.py # will use ALL available cuda resources found on system python train.py --device 0,1 # specify devices python train.py --device 0 # specify 1 device python train.py --device cpu # force cpu usage
test.py works exactly the same way. detect.py accepts a
--device
argument, but is limited to 1 gpu.
would you pls support multi-gpus while using detect.py ?
@liangshi036 we don't have the resources to implement suggestions, but you can do this yourself and submit a PR!
🚀 Feature
Multiple GPU support
Motivation
Increased performance!
Pitch
I just bought a 3-way p100 box, come on please :)
Alternatives
Google Compute TPU support?
Additional context