hukaixuan19970627 / yolov5_obb

yolov5 + csl_label.(Oriented Object Detection)(Rotation Detection)(Rotated BBox)基于yolov5的旋转目标检测
GNU General Public License v3.0
1.81k stars 425 forks source link

RuntimeError: CUDA out of memory #578

Open liamh1999 opened 1 year ago

liamh1999 commented 1 year ago

Is anyone encoutering this problem here ? it was working fine yesterday, but i tried to use --cache today and it did not work and i tried to run without it but i still encounter this problem here. If anyone has a solution for that thanks!

train: weights=, cfg=models/yolov5m.yaml, data=data/dota.yaml, hyp=data/hyps/obb/hyp.finetune_dota.yaml, epochs=20, batch_size=16, imgsz=1024, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, evolve=None, bucket=, cache=None, image_weights=False, device=0, multi_scale=False, single_cls=False, adam=False, sync_bn=False, workers=8, project=runs/train, name=exp, exist_ok=False, quad=False, linear_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest github: Command 'git fetch && git config --get remote.origin.url' timed out after 5 seconds YOLOv5 🚀 b00c3f2 torch 1.7.1+cu110 CUDA:0 (Tesla V100-SXM2-16GB, 16151MiB)

hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, theta=0.5, theta_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=180.0, translate=0.1, scale=0.25, shear=0.0, perspective=0.0, flipud=0.5, fliplr=0.5, mosaic=0.75, mixup=0.1, copy_paste=0.0, cls_theta=180, csl_radius=2.0 Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5 🚀 runs (RECOMMENDED) TensorBoard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/ Overriding model.yaml nc=80 with nc=15

       from  n    params  module                                  arguments                     

0 -1 1 5280 models.common.Conv [3, 48, 6, 2, 2]
1 -1 1 41664 models.common.Conv [48, 96, 3, 2]
2 -1 4 111744 models.common.C3 [96, 96, 4]
3 -1 1 166272 models.common.Conv [96, 192, 3, 2]
4 -1 8 814848 models.common.C3 [192, 192, 8]
5 -1 1 664320 models.common.Conv [192, 384, 3, 2]
6 -1 12 4729344 models.common.C3 [384, 384, 12]
7 -1 1 2655744 models.common.Conv [384, 768, 3, 2]
8 -1 4 7087104 models.common.C3 [768, 768, 4]
9 -1 1 1476864 models.common.SPPF [768, 768, 5]
10 -1 1 295680 models.common.Conv [768, 384, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 [-1, 6] 1 0 models.common.Concat [1]
13 -1 4 1921536 models.common.C3 [768, 384, 4, False]
14 -1 1 74112 models.common.Conv [384, 192, 1, 1]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 4] 1 0 models.common.Concat [1]
17 -1 4 481536 models.common.C3 [384, 192, 4, False]
18 -1 1 332160 models.common.Conv [192, 192, 3, 2]
19 [-1, 14] 1 0 models.common.Concat [1]
20 -1 4 1774080 models.common.C3 [384, 384, 4, False]
21 -1 1 1327872 models.common.Conv [384, 384, 3, 2]
22 [-1, 10] 1 0 models.common.Concat [1]
23 -1 4 7087104 models.common.C3 [768, 768, 4, False]
24 [17, 20, 23] 1 808200 models.yolo.Detect [15, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [192, 384, 768]] Model Summary: 567 layers, 31855464 parameters, 31855464 gradients, 76.8 GFLOPs

Scaled weight_decay = 0.0005 optimizer: SGD with parameter groups 123 weight, 126 weight (no decay), 126 bias train: Scanning 'dataset/Datasets/Dota_Dataset/labelTxt/train.cache' images and labels... 1411 found, 0 missing, 2 empty, 0 corrupted: 100% 1411/1411 [00:00<00:00, 14685267.85it/s] val: Scanning 'dataset/Datasets/Dota_Dataset/labelTxt/valid.cache' images and labels... 458 found, 0 missing, 2 empty, 0 corrupted: 100% 458/458 [00:00<00:00, 2928340.29it/s] Plotting labels to runs/train/exp23/labels_xyls.jpg...

AutoAnchor: 3.61 anchors/target, 0.986 Best Possible Recall (BPR). Current anchors are a good fit to dataset ✅ Image sizes 1024 train, 1024 val Using 8 dataloader workers Logging results to runs/train/exp23 Starting training for 20 epochs...

 Epoch   gpu_mem       box       obj       cls     theta    labels  img_size

0% 0/89 [00:00<?, ?it/s]Traceback (most recent call last): File "train.py", line 633, in main(opt) File "train.py", line 530, in main train(opt.hyp, opt, device, callbacks) File "train.py", line 325, in train pred = model(imgs) # forward File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/content/drive/MyDrive/yolov5_obb/models/yolo.py", line 147, in forward return self._forward_once(x, profile, visualize) # single-scale inference, train File "/content/drive/MyDrive/yolov5_obb/models/yolo.py", line 177, in _forward_once x = m(x) # run File "/usr/local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, **kwargs) File "/content/drive/MyDrive/yolov5_obb/models/common.py", line 138, in forward return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1)) RuntimeError: CUDA out of memory. Tried to allocate 96.00 MiB (GPU 0; 15.77 GiB total capacity; 14.27 GiB already allocated; 74.12 MiB free; 14.45 GiB reserved in total by PyTorch) 0% 0/89 [00:50<?, ?it/s]

zspzhangshoupeng commented 1 year ago

训练参数调小了再试呗