使用自定义的数据集，类别数为32，剪枝后微调报错

mmpp406 commented 4 months ago

报错信息如下： Epoch gpu_mem box cls dfl labels img_size 0%| | 0/32 [00:00<?, ?it/s] ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [0,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [1,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [2,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [3,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [4,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [5,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [6,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [7,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [8,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [9,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [10,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [11,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [12,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [13,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [14,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [15,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [16,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [17,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [18,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [19,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [20,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [21,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [22,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [23,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [24,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [25,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [26,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [27,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [28,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [29,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [30,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [384,0,0], thread: [31,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. 0%| | 0/32 [00:00<?, ?it/s]
++++++++++++++++++pred:3 torch.Size([32, 67, 80, 80])+++++++++++++++ ++++++++++++++++++target:133 torch.Size([7])+++++++++++++++

RuntimeError Traceback (most recent call last) File /data1/WJ/yolov8-obb-jianzhi/prune_finetune.py:648, in 646 if name == "main": 647 opt = parse_opt() --> 648 main(opt)

File /data1/WJ/yolov8-obb-jianzhi/prune_finetune.py:545, in main(opt, callbacks) 543 # Train 544 if not opt.evolve: --> 545 train(opt.hyp, opt, device, callbacks) 546 if WORLD_SIZE > 1 and RANK == 0: 547 LOGGER.info('Destroying process group... ')

File /data1/WJ/yolov8-obb-jianzhi/prune_finetune.py:330, in train(hyp, opt, device, callbacks) 328 print(f"++++++++++++++++++pred:{len(pred)} {pred[0].shape}+++++++++++++++") 329 print(f"++++++++++++++++++target:{len(targets)} {targets[0].shape}+++++++++++++++") --> 330 loss, loss_items = compute_loss(pred, targets.to(device)) # loss scaled by batch_size 333 # if int(epoch)>=int(epochs/2): 334 # loss, loss_items = compute_loss(pred, targets.to(device),'l1') # loss scaled by batch_size 335 # else: 336 # loss, loss_items = compute_loss(pred, targets.to(device),'l2') # loss scaled by batch_size 338 if RANK != -1:

File /data1/WJ/yolov8-obb-jianzhi/utils/loss.py:101, in v8DetectionLoss.call(self, p, targets, model_l) 98 mask_gt = gtbboxes.sum(2, keepdim=True).gt(0) #torch.Size([16, 2, 1]) 100 #TAL动态匹配 --> 101 target_labels, target_bboxes, target_scores, fgmask, = self.assigner( 102 pred_scores.detach().sigmoid(), (pred_bboxes.detach() stride_tensor).type(gt_bboxes.dtype), 103 anchor_points stride_tensor, gt_labels, gt_bboxes, mask_gt) 106 target_scores_sum = max(target_scores.sum(), 1) 107 target_labels = torch.where(target_scores > 0 , 1, 0)

File ~/anaconda3/envs/wj_pytorch/lib/python3.9/site-packages/torch/nn/modules/module.py:1501, in Module._call_impl(self, *args, *kwargs) 1496 # If we don't have any hooks, we want to skip the rest of the logic in 1497 # this function, and just call forward. 1498 if not (self._backward_hooks or self._backward_pre_hooks or self._forward_hooks or self._forward_pre_hooks 1499 or _global_backward_pre_hooks or _global_backward_hooks 1500 or _global_forward_hooks or _global_forward_pre_hooks): -> 1501 return forward_call(args, **kwargs) 1502 # Do not call functions when jit is used 1503 full_backward_hooks, non_full_backward_hooks = [], []

File ~/anaconda3/envs/wj_pytorch/lib/python3.9/site-packages/torch/utils/_contextlib.py:115, in context_decorator..decorate_context(*args, kwargs) 112 @functools.wraps(func) 113 def decorate_context(*args, *kwargs): 114 with ctx_factory(): --> 115 return func(args, kwargs)

File /data1/WJ/yolov8-obb-jianzhi/utils/tal.py:208, in TaskAlignedAssigner.forward(self, pd_scores, pd_bboxes, anc_points, gt_labels, gt_bboxes, mask_gt) 202 device = gt_bboxes.device 203 return (torch.full_like(pd_scores[..., 0], self.bg_idx).to(device), torch.zeros_like(pd_bboxes).to(device), 204 torch.zeros_like(pd_scores).to(device), torch.zeros_like(pd_scores[..., 0]).to(device), 205 torch.zeros_like(pd_scores[..., 0]).to(device)) --> 208 mask_pos, align_metric, overlaps = self.get_pos_mask(pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, 209 mask_gt) 211 target_gt_idx, fg_mask, mask_pos = select_highest_overlaps(mask_pos, overlaps, self.n_max_boxes) 212 # assigned target

File /data1/WJ/yolov8-obb-jianzhi/utils/tal.py:231, in TaskAlignedAssigner.get_pos_mask(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, anc_points, mask_gt) 228 mask_in_gts = check_points_in_rotated_boxes(anc_points, gt_bboxes) 230 # get anchor_align metric, (b, max_num_obj, hw) [16, 2, 8400],[16, 2, 8400] --> 231 align_metric, overlaps = self.get_box_metrics(pd_scores, pd_bboxes, gt_labels, gt_bboxes,mask_in_gts mask_gt) 233 mask_topk = self.select_topk_candidates(align_metric, topk_mask=mask_gt.expand(-1, -1, self.topk).bool()) 234 # merge all mask to a final mask, (b, max_num_obj, h*w),mask_gt=torch.Size([16, 2, 1])

File /data1/WJ/yolov8-obb-jianzhi/utils/tal.py:250, in TaskAlignedAssigner.get_box_metrics(self, pd_scores, pd_bboxes, gt_labels, gt_bboxes, mask_gt) 248 ind[1] = gt_labels.squeeze(-1) # b, max_num_obj 249 # Get the scores of each grid for each gt cls --> 250 bbox_scores[mask_gt] = pd_scores[ind[0], :, ind[1]][mask_gt] # b, max_num_obj, hw 252 # (b, max_num_obj, 1, 4), (b, 1, hw, 4) 253 pd_boxes = pd_bboxes.unsqueeze(1).expand(-1, self.n_max_boxes, -1, -1)[mask_gt]

RuntimeError: CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.`

mmpp406 commented 4 months ago

请问微调代码里面是不是某个参数写死了？

yzqxy commented 4 months ago

请问微调代码里面是不是某个参数写死了？

能剪枝的话 finetune应该没啥问题，用的是latest代码？

mmpp406 commented 4 months ago

请问微调代码里面是不是某个参数写死了？

能剪枝的话 finetune应该没啥问题，用的是latest代码？

是用的最新代码，：微调命令是这一个：python prune_finetune.py --weights prune/pruned_model.pt --data 'hrsc.yaml' --epochs 100 --imgsz 640 --batch-size 16

yzqxy commented 4 months ago

请问微调代码里面是不是某个参数写死了？

能剪枝的话 finetune应该没啥问题，用的是latest代码？

是用的最新代码，：微调命令是这一个：python prune_finetune.py --weights prune/pruned_model.pt --data 'hrsc.yaml' --epochs 100 --imgsz 640 --batch-size 16

你可以试试上个版本的旋转框代码，测试是没啥问题的，最新版本集成了旋转框+关键点的代码后，剪枝部分我没测试有没有问题，后续有空融合完旋转框+分割之后再一起测试一下

yzqxy / Yolov8_obb_Prune_Track

使用自定义的数据集，类别数为32，剪枝后微调报错 #38