zhihou7 / BatchFormer

CVPR2022, BatchFormer: Learning to Explore Sample Relationships for Robust Representation Learning, https://arxiv.org/abs/2203.01522
246 stars 20 forks source link

复现deformable detr batchformer遇到的报错问题 #22

Closed cbn3 closed 1 year ago

cbn3 commented 1 year ago

CUDA error: device-side assert triggered File "/root/autodl-tmp/project/deformable-detr-batchformer/models/matcher.py", line 81, in forward cost_class = pos_cost_class[:, tgt_ids] - neg_cost_class[:, tgt_ids] File "/root/autodl-tmp/project/deformable-detr-batchformer/models/deformable_detr.py", line 342, in forward indices = self.matcher(outputs_without_aux, targets) File "/root/autodl-tmp/project/deformable-detr-batchformer/engine.py", line 45, in train_one_epoch loss_dict = criterion(outputs, targets) File "/root/autodl-tmp/project/deformable-detr-batchformer/main.py", line 282, in main train_stats = train_one_epoch( File "/root/autodl-tmp/project/deformable-detr-batchformer/main.py", line 334, in main(args) 请问以上报错是什么原因 该怎样解决?

cbn3 commented 1 year ago

CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. File "/root/autodl-tmp/project/deformable-detr-batchformer/util/box_ops.py", line 60, in generalized_box_iou assert (boxes1[:, 2:] >= boxes1[:, :2]).all() File "/root/autodl-tmp/project/deformable-detr-batchformer/models/matcher.py", line 87, in forward cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), File "/root/autodl-tmp/project/deformable-detr-batchformer/models/deformable_detr.py", line 342, in forward indices = self.matcher(outputs_without_aux, targets) File "/root/autodl-tmp/project/deformable-detr-batchformer/engine.py", line 45, in train_one_epoch loss_dict = criterion(outputs, targets) File "/root/autodl-tmp/project/deformable-detr-batchformer/main.py", line 282, in main train_stats = train_one_epoch( File "/root/autodl-tmp/project/deformable-detr-batchformer/main.py", line 334, in main(args) 未加以下代码时是以上报错

os.environ['CUDA_LAUNCH_BLOCKING'] = '1'

zhihou7 commented 1 year ago

I thought you will suffer from this issue without adding batchformerv2? Besides, have you tried it with a single GPU?

cbn3 commented 1 year ago

我直接用了您的deformable detr batchformer的代码 我自己的deformable detr是可以正常运行的。我就是用单gpu运行的main.py

cbn3 commented 1 year ago

I thought you will suffer from this issue without adding batchformerv2? Besides, have you tried it with a single GPU?

我用batchformerv2里面的deformable detr代码也可以正常运行

zhihou7 commented 1 year ago

Could you provide you running scripts (hyper-parameters)?

Actually, the revised part compared to original Deformable-DETR is mainly as follows,

https://github.com/zhihou7/BatchFormer/blob/305efaa6c54a0cfd69c99a919e48deb9f84040d8/batchformer-v2/deformable-detr/models/deformable_transformer.py#L270-L292

cbn3 commented 1 year ago

作者你好 我在把类别数量+1后不再报那个错误了 感谢您的及时回复! 在deformable detr源代码中中我把models下面的deformable detr文件中的类别设置为10可以正常运行,但是在batchformerv2版本中需要把类别数量设置到11才可以正常运行,这是为什么呢?我本身数据集类别数量就是10。

zhihou7 commented 1 year ago

Possibly, the dataset has a class label larger than 10. But it can not explain why the original deformable detr does not suffer from this issue.

CUDA error: device-side assert triggered: it is usually because the index is larger than the length of the tensor. for example, a[10] while a is an array with length 10.

我也觉得奇怪。你可以把标签打印出来看看。我的方式其实只是混合了特征,并没有修改任何标签。是不是有可能这个模块跑失败了,网络跑崩了?你打印一下assert (boxes1[:, 2:] >= boxes1[:, :2]).all() 这个里面 boxes的数值看看,是不是不满足这个断言。

cbn3 commented 1 year ago

好的 谢谢作者