tusen-ai / Anchor3DLane

Official PyTorch implementation for paper`Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection' accepted by CVPR 2023
144 stars 25 forks source link

openlane数据集训练 #20

Open onionysy opened 1 year ago

onionysy commented 1 year ago

我们在openlane公开数据集上进行训练,但是训练到一半出现了cuda error的问题。我们怀疑这是数据集车道线类别超出了21类的限制?但是在用openlane.py处理数据集的时候,我们看到有对于超出的类别进行了修改

                if lane_results['category'] >= 21:
                    lane_results['category'] = 20

我们目前已经不知道是那里出了问题,您有什么建议吗?报错信息如下:

2023-07-28 00:20:42,672 - mmseg - INFO - Exp name: anchor3dlane_iter.py 2023-07-28 00:20:42,673 - mmseg - INFO - Iter [3000/60000] lr: 2.000e-04, eta: 10:10:30, time: 0.642, data_time: 0.014, memory: 11367, batch_positives: 12.7812, batch_negatives: 450.0000, cls_loss: 0.1614, reg_losses_x: 0.0256, reg_losses_z: 0.0040, reg_losses_vis: 0.0297, liou_losses_x: 0.3897, liou_losses_z: 0.2364, cls_loss0: 0.0699, reg_losses_x0: 0.0508, reg_losses_z0: 0.0053, reg_losses_vis0: 0.0249, liou_losses_x0: 0.5498, liou_losses_z0: 0.2677, loss: 1.8151 2023-07-28 00:20:49,113 - mmseg - INFO - Iter [3010/60000] lr: 2.000e-04, eta: 10:10:24, time: 0.644, data_time: 0.013, memory: 11367, batch_positives: 13.5938, batch_negatives: 450.0000, cls_loss: 0.1474, reg_losses_x: 0.0386, reg_losses_z: 0.0039, reg_losses_vis: 0.0316, liou_losses_x: 0.3700, liou_losses_z: 0.2279, cls_loss0: 0.0648, reg_losses_x0: 0.0706, reg_losses_z0: 0.0048, reg_losses_vis0: 0.0262, liou_losses_x0: 0.5458, liou_losses_z0: 0.2555, loss: 1.7871 2023-07-28 00:20:55,559 - mmseg - INFO - Iter [3020/60000] lr: 2.000e-04, eta: 10:10:18, time: 0.645, data_time: 0.014, memory: 11367, batch_positives: 13.1438, batch_negatives: 450.0000, cls_loss: 0.1464, reg_losses_x: 0.0312, reg_losses_z: 0.0052, reg_losses_vis: 0.0316, liou_losses_x: 0.3480, liou_losses_z: 0.2309, cls_loss0: 0.0611, reg_losses_x0: 0.0473, reg_losses_z0: 0.0065, reg_losses_vis0: 0.0261, liou_losses_x0: 0.5236, liou_losses_z0: 0.2592, loss: 1.7170 2023-07-28 00:21:01,921 - mmseg - INFO - Iter [3030/60000] lr: 2.000e-04, eta: 10:10:10, time: 0.636, data_time: 0.014, memory: 11367, batch_positives: 11.4375, batch_negatives: 450.0000, cls_loss: 0.1548, reg_losses_x: 0.0307, reg_losses_z: 0.0042, reg_losses_vis: 0.0278, liou_losses_x: 0.3571, liou_losses_z: 0.2288, cls_loss0: 0.0599, reg_losses_x0: 0.0616, reg_losses_z0: 0.0067, reg_losses_vis0: 0.0242, liou_losses_x0: 0.5342, liou_losses_z0: 0.2702, loss: 1.7603 2023-07-28 00:21:08,344 - mmseg - INFO - Iter [3040/60000] lr: 2.000e-04, eta: 10:10:04, time: 0.642, data_time: 0.014, memory: 11367, batch_positives: 13.1125, batch_negatives: 450.0000, cls_loss: 0.1414, reg_losses_x: 0.0200, reg_losses_z: 0.0052, reg_losses_vis: 0.0308, liou_losses_x: 0.3512, liou_losses_z: 0.2308, cls_loss0: 0.0537, reg_losses_x0: 0.0501, reg_losses_z0: 0.0058, reg_losses_vis0: 0.0270, liou_losses_x0: 0.5265, liou_losses_z0: 0.2559, loss: 1.6984 2023-07-28 00:21:14,719 - mmseg - INFO - Iter [3050/60000] lr: 2.000e-04, eta: 10:09:56, time: 0.637, data_time: 0.014, memory: 11367, batch_positives: 13.2000, batch_negatives: 450.0000, cls_loss: 0.1403, reg_losses_x: 0.0277, reg_losses_z: 0.0052, reg_losses_vis: 0.0311, liou_losses_x: 0.3714, liou_losses_z: 0.2311, cls_loss0: 0.0684, reg_losses_x0: 0.0540, reg_losses_z0: 0.0072, reg_losses_vis0: 0.0258, liou_losses_x0: 0.5408, liou_losses_z0: 0.2696, loss: 1.7728 2023-07-28 00:21:21,130 - mmseg - INFO - Iter [3060/60000] lr: 2.000e-04, eta: 10:09:49, time: 0.641, data_time: 0.013, memory: 11367, batch_positives: 11.3625, batch_negatives: 450.0000, cls_loss: 0.1447, reg_losses_x: 0.0208, reg_losses_z: 0.0035, reg_losses_vis: 0.0295, liou_losses_x: 0.3702, liou_losses_z: 0.2274, cls_loss0: 0.0565, reg_losses_x0: 0.0476, reg_losses_z0: 0.0047, reg_losses_vis0: 0.0268, liou_losses_x0: 0.5384, liou_losses_z0: 0.2676, loss: 1.7376 2023-07-28 00:21:27,589 - mmseg - INFO - Iter [3070/60000] lr: 2.000e-04, eta: 10:09:44, time: 0.646, data_time: 0.014, memory: 11367, batch_positives: 13.2188, batch_negatives: 450.0000, cls_loss: 0.1481, reg_losses_x: 0.0324, reg_losses_z: 0.0038, reg_losses_vis: 0.0312, liou_losses_x: 0.3801, liou_losses_z: 0.2369, cls_loss0: 0.0596, reg_losses_x0: 0.0729, reg_losses_z0: 0.0042, reg_losses_vis0: 0.0266, liou_losses_x0: 0.5654, liou_losses_z0: 0.2595, loss: 1.8206 2023-07-28 00:21:33,933 - mmseg - INFO - Iter [3080/60000] lr: 2.000e-04, eta: 10:09:36, time: 0.634, data_time: 0.013, memory: 11367, batch_positives: 13.8812, batch_negatives: 450.0000, cls_loss: 0.1477, reg_losses_x: 0.0295, reg_losses_z: 0.0069, reg_losses_vis: 0.0318, liou_losses_x: 0.3902, liou_losses_z: 0.2495, cls_loss0: 0.0649, reg_losses_x0: 0.0831, reg_losses_z0: 0.0071, reg_losses_vis0: 0.0274, liou_losses_x0: 0.5694, liou_losses_z0: 0.2682, loss: 1.8756 2023-07-28 00:21:40,287 - mmseg - INFO - Iter [3090/60000] lr: 2.000e-04, eta: 10:09:28, time: 0.635, data_time: 0.013, memory: 11367, batch_positives: 13.5938, batch_negatives: 450.0000, cls_loss: 0.1450, reg_losses_x: 0.0237, reg_losses_z: 0.0068, reg_losses_vis: 0.0308, liou_losses_x: 0.3682, liou_losses_z: 0.2500, cls_loss0: 0.0605, reg_losses_x0: 0.0485, reg_losses_z0: 0.0093, reg_losses_vis0: 0.0261, liou_losses_x0: 0.5408, liou_losses_z0: 0.2832, loss: 1.7929 2023-07-28 00:21:46,753 - mmseg - INFO - Iter [3100/60000] lr: 2.000e-04, eta: 10:09:22, time: 0.647, data_time: 0.015, memory: 11367, batch_positives: 13.6750, batch_negatives: 450.0000, cls_loss: 0.1374, reg_losses_x: 0.0236, reg_losses_z: 0.0057, reg_losses_vis: 0.0305, liou_losses_x: 0.3791, liou_losses_z: 0.2349, cls_loss0: 0.0578, reg_losses_x0: 0.0576, reg_losses_z0: 0.0067, reg_losses_vis0: 0.0271, liou_losses_x0: 0.5623, liou_losses_z0: 0.2624, loss: 1.7851 2023-07-28 00:21:53,178 - mmseg - INFO - Iter [3110/60000] lr: 2.000e-04, eta: 10:09:16, time: 0.642, data_time: 0.013, memory: 11367, batch_positives: 13.3875, batch_negatives: 450.0000, cls_loss: 0.1396, reg_losses_x: 0.0203, reg_losses_z: 0.0043, reg_losses_vis: 0.0323, liou_losses_x: 0.3550, liou_losses_z: 0.2296, cls_loss0: 0.0614, reg_losses_x0: 0.0441, reg_losses_z0: 0.0054, reg_losses_vis0: 0.0289, liou_losses_x0: 0.5231, liou_losses_z0: 0.2590, loss: 1.7030 2023-07-28 00:21:59,601 - mmseg - INFO - Iter [3120/60000] lr: 2.000e-04, eta: 10:09:09, time: 0.642, data_time: 0.013, memory: 11367, batch_positives: 13.2500, batch_negatives: 450.0000, cls_loss: 0.1420, reg_losses_x: 0.0206, reg_losses_z: 0.0036, reg_losses_vis: 0.0315, liou_losses_x: 0.3702, liou_losses_z: 0.2274, cls_loss0: 0.0663, reg_losses_x0: 0.0586, reg_losses_z0: 0.0050, reg_losses_vis0: 0.0270, liou_losses_x0: 0.5430, liou_losses_z0: 0.2599, loss: 1.7553 2023-07-28 00:22:06,054 - mmseg - INFO - Iter [3130/60000] lr: 2.000e-04, eta: 10:09:03, time: 0.645, data_time: 0.014, memory: 11367, batch_positives: 12.5625, batch_negatives: 450.0000, cls_loss: 0.1473, reg_losses_x: 0.0194, reg_losses_z: 0.0051, reg_losses_vis: 0.0305, liou_losses_x: 0.3650, liou_losses_z: 0.2466, cls_loss0: 0.0599, reg_losses_x0: 0.0498, reg_losses_z0: 0.0066, reg_losses_vis0: 0.0257, liou_losses_x0: 0.5442, liou_losses_z0: 0.2827, loss: 1.7829 2023-07-28 00:22:12,533 - mmseg - INFO - Iter [3140/60000] lr: 2.000e-04, eta: 10:08:58, time: 0.648, data_time: 0.014, memory: 11367, batch_positives: 12.8063, batch_negatives: 450.0000, cls_loss: 0.1401, reg_losses_x: 0.0304, reg_losses_z: 0.0041, reg_losses_vis: 0.0299, liou_losses_x: 0.3633, liou_losses_z: 0.2325, cls_loss0: 0.0563, reg_losses_x0: 0.0659, reg_losses_z0: 0.0052, reg_losses_vis0: 0.0265, liou_losses_x0: 0.5352, liou_losses_z0: 0.2644, loss: 1.7539 2023-07-28 00:22:19,005 - mmseg - INFO - Iter [3150/60000] lr: 2.000e-04, eta: 10:08:52, time: 0.647, data_time: 0.014, memory: 11367, batch_positives: 12.8063, batch_negatives: 450.0000, cls_loss: 0.1518, reg_losses_x: 0.0198, reg_losses_z: 0.0054, reg_losses_vis: 0.0323, liou_losses_x: 0.3584, liou_losses_z: 0.2361, cls_loss0: 0.0587, reg_losses_x0: 0.0531, reg_losses_z0: 0.0068, reg_losses_vis0: 0.0268, liou_losses_x0: 0.5368, liou_losses_z0: 0.2713, loss: 1.7572 2023-07-28 00:22:25,480 - mmseg - INFO - Iter [3160/60000] lr: 2.000e-04, eta: 10:08:47, time: 0.648, data_time: 0.014, memory: 11367, batch_positives: 11.9625, batch_negatives: 450.0000, cls_loss: 0.1476, reg_losses_x: 0.0203, reg_losses_z: 0.0039, reg_losses_vis: 0.0286, liou_losses_x: 0.3592, liou_losses_z: 0.2322, cls_loss0: 0.0604, reg_losses_x0: 0.0450, reg_losses_z0: 0.0062, reg_losses_vis0: 0.0247, liou_losses_x0: 0.5252, liou_losses_z0: 0.2661, loss: 1.7194 2023-07-28 00:22:31,967 - mmseg - INFO - Iter [3170/60000] lr: 2.000e-04, eta: 10:08:41, time: 0.649, data_time: 0.014, memory: 11367, batch_positives: 13.4187, batch_negatives: 450.0000, cls_loss: 0.1473, reg_losses_x: 0.0182, reg_losses_z: 0.0046, reg_losses_vis: 0.0329, liou_losses_x: 0.3593, liou_losses_z: 0.2498, cls_loss0: 0.0609, reg_losses_x0: 0.0429, reg_losses_z0: 0.0058, reg_losses_vis0: 0.0275, liou_losses_x0: 0.5267, liou_losses_z0: 0.2846, loss: 1.7605 2023-07-28 00:22:38,414 - mmseg - INFO - Iter [3180/60000] lr: 2.000e-04, eta: 10:08:35, time: 0.645, data_time: 0.014, memory: 11367, batch_positives: 12.9000, batch_negatives: 450.0000, cls_loss: 0.1436, reg_losses_x: 0.0247, reg_losses_z: 0.0040, reg_losses_vis: 0.0299, liou_losses_x: 0.3473, liou_losses_z: 0.2335, cls_loss0: 0.0534, reg_losses_x0: 0.0536, reg_losses_z0: 0.0048, reg_losses_vis0: 0.0250, liou_losses_x0: 0.5128, liou_losses_z0: 0.2604, loss: 1.6928 2023-07-28 00:22:44,884 - mmseg - INFO - Iter [3190/60000] lr: 2.000e-04, eta: 10:08:30, time: 0.647, data_time: 0.014, memory: 11367, batch_positives: 14.3000, batch_negatives: 450.0000, cls_loss: 0.1401, reg_losses_x: 0.0227, reg_losses_z: 0.0049, reg_losses_vis: 0.0356, liou_losses_x: 0.3705, liou_losses_z: 0.2517, cls_loss0: 0.0573, reg_losses_x0: 0.0490, reg_losses_z0: 0.0060, reg_losses_vis0: 0.0310, liou_losses_x0: 0.5568, liou_losses_z0: 0.2878, loss: 1.8134 /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [8,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [18,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"failed. Traceback (most recent call last): File "/snap/pycharm-community/342/plugins/python-ce/helpers/pydev/pydevd.py", line 1500, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/snap/pycharm-community/342/plugins/python-ce/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/home/buaa/songyue/Anchor3DLane-main/tools/train.py", line 364, in <module> main() File "/home/buaa/songyue/Anchor3DLane-main/tools/train.py", line 354, in main train( File "/home/buaa/songyue/Anchor3DLane-main/tools/train.py", line 242, in train runner.run(data_loaders, cfg.workflow) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 144, in run iter_runner(iter_loaders[i], **kwargs) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 64, in train outputs = self.model.train_step(data_batch, self.optimizer, **kwargs) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step return self.module.train_step(*inputs[0], **kwargs[0]) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 477, in train_step losses, other_vars = self(**data_batch) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 398, in forward return self.forward_train(img, mask, img_metas, **kwargs) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 116, in new_func return old_func(*args, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 448, in forward_train losses, other_vars = self.loss(output, gt_3dlanes, output_aux) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 205, in new_func return old_func(*args, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/lane_detector/anchor_3dlane.py", line 411, in loss anchor_losses = self.lane_loss(proposals_list, gt_3dlanes) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/lane_loss.py", line 137, in forward cls_loss = focal_loss(cls_pred, cls_target) File "/home/buaa/anaconda3/envs/lane3d/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/kornia_focal.py", line 145, in forward return focal_loss(input, target, self.alpha, self.gamma, self.reduction, self.eps) File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/kornia_focal.py", line 84, in focal_loss target_one_hot: torch.Tensor = one_hot(target, num_classes=input.shape[1], device=input.device, dtype=input.dtype) # [b, c, h, w] File "/home/buaa/songyue/Anchor3DLane-main/mmseg/models/losses/kornia_focal.py", line 50, in one_hot return one_hot.scatter_(1, labels.unsqueeze(1), 1.0) + eps RuntimeError: CUDA error: device-side assert triggered

spyflying commented 1 year ago

应该是因为有一条数据的标签超范围了,上面修改标签的那行代码是在合并左右curb,并没有对超出范围的数据做判断。可以把数据过一遍,超出范围的数据直接删掉。

onionysy commented 1 year ago

合并左右curb?您所说的是Anchor3DLane-main/tools/convert_datasets/openlane.py中第251-252两行代码吗?似乎和我理解的不太一样?

onionysy commented 1 year ago

图片

BinBin962464 commented 6 months ago

你好,请问这个问题您解决了么?