hustvl / CrossVIS

[ICCV 2021] Crossover Learning for Fast Online Video Instance Segmentation
https://arxiv.org/abs/2104.05970
Other
85 stars 4 forks source link

RuntimeError: CUDA error: device-side assert triggered #11

Closed Alxx999 closed 2 years ago

Alxx999 commented 2 years ago

Hello, I made the following mistake during training, how should I solve it?

I added it in adet/data/builtin.py before training:

_PERDEFINED_SPLITS_YOUTUBEVIS_VIDEO = { 'youtubevis_train':

('youtubevis/train/', 'youtubevis/annotations/train.json'),

('/media/lin/file/VIS/datasets/youtube-vis2021/train/JPEGImages', '/media/lin/file/VIS/datasets/youtube-vis2021/train/instances.json'),
'youtubevis_valid':
# ('youtubevis/valid/', 'youtubevis/annotations/valid.json'),
('/media/lin/file/VIS/datasets/youtube-vis2021/valid/JPEGImages', '/media/lin/file/VIS/datasets/youtube-vis2021/valid/instances.json'),
'youtubevis_test':
('youtubevis/test/', 'youtubevis/annotations/test.json'),

}

metadata_youtubevis_video = { 'thing_classes': [ 'airplane', 'bear', 'bird', 'boat', 'car', 'cat', 'cow', 'deer', 'dog', 'duck', 'earless_seal', 'elephant', 'fish', 'flying_disc', 'fox', 'frog', 'giant_panda', 'giraffe', 'horse', 'leopard', 'lizard', 'monkey', 'motorbike', 'mouse', 'parrot', 'person', 'rabbit', 'shark', 'skateboard', 'snake', 'snowboard', 'squirrel', 'surfboard', 'tennis_racket', 'tiger', 'train', 'truck', 'turtle', 'whale', 'zebra' ] }

Then: I'm detectron2 / data/datasets/builtin_meta. Registered in py VIS_CATEGORIES, and modified to them

def _get_coco_instances_meta(): thing_ids = [k["id"] for k in VIS_CATEGORIES if k["isthing"] == 1] thing_colors = [k["color"] for k in VIS_CATEGORIES if k["isthing"] == 1] assert len(thing_ids) == 40, len(thing_ids)

Mapping from the incontiguous COCO category id to an id in [0, 79]

thing_dataset_id_to_contiguous_id = {k: i for i, k in enumerate(thing_ids)}
thing_classes = [k["name"] for k in VIS_CATEGORIES if k["isthing"] == 1]
ret = {
    "thing_dataset_id_to_contiguous_id": thing_dataset_id_to_contiguous_id,
    "thing_classes": thing_classes,
    "thing_colors": thing_colors,
}
return ret

Then,change NUM_CLASSES in adet/config/defaults.py to 40

My final training order is:python tools/train_net.py --config configs/CrossVIS/R_50_1x.yaml MODEL.WEIGHTS CondInst_MS_R_50_1x.pth

Error is as follows [05/16 20:27:26 adet.data.common]: Serializing 89750 elements to byte tensors and concatenating them all ... [05/16 20:27:26 adet.data.common]: Serialized dataset takes 216.92 MiB [05/16 20:27:26 adet.data.build]: Using training sampler TrainingSampler [05/16 20:27:26 fvcore.common.checkpoint]: [Checkpointer] Loading from CondInst_MS_R_50_1x.pth ... WARNING [05/16 20:27:26 fvcore.common.checkpoint]: Skip loading parameter 'proposal_generator.fcos_head.cls_logits.weight' to the model due to incompatible shapes: (80, 256, 3, 3) in the checkpoint but (40, 256, 3, 3) in the model! You might want to double check if this is expected. WARNING [05/16 20:27:26 fvcore.common.checkpoint]: Skip loading parameter 'proposal_generator.fcos_head.cls_logits.bias' to the model due to incompatible shapes: (80,) in the checkpoint but (40,) in the model! You might want to double check if this is expected. WARNING [05/16 20:27:26 fvcore.common.checkpoint]: Some model parameters or buffers are not found in the checkpoint: cls.{bias, weight} mask_head._iter proposal_generator.fcos_head.cls_logits.{bias, weight} proposal_generator.fcos_head.reid_pred.{bias, weight} proposal_generator.fcos_head.reid_pred_bn.{bias, running_mean, running_var, weight} proposal_generator.fcos_head.reid_tower.0.{bias, weight} proposal_generator.fcos_head.reid_tower.1.{bias, weight} proposal_generator.fcos_head.reid_tower.10.{bias, weight} proposal_generator.fcos_head.reid_tower.3.{bias, weight} proposal_generator.fcos_head.reid_tower.4.{bias, weight} proposal_generator.fcos_head.reid_tower.6.{bias, weight} proposal_generator.fcos_head.reid_tower.7.{bias, weight} proposal_generator.fcos_head.reid_tower.9.{bias, weight} [05/16 20:27:26 adet.trainer]: Starting training from iteration 0 /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [32,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [33,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [34,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [35,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [36,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [37,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [44,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [45,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [46,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [47,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [48,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [49,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [51,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [52,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [53,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [54,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [3,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [4,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [5,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [6,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [18,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [19,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [20,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /pytorch/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:312: operator(): block: [0,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. Traceback (most recent call last): File "tools/train_net.py", line 231, in args=(args, ), File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "tools/train_net.py", line 219, in main return trainer.train() File "tools/train_net.py", line 97, in train self.train_loop(self.start_iter, self.max_iter) File "tools/train_net.py", line 87, in train_loop self.run_step() File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/detectron2/engine/train_loop.py", line 285, in run_step losses.backward() File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/lin/.conda/envs/crossvis/lib/python3.7/site-packages/torch/autograd/init.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: CUDA error: device-side assert triggered

HarryHsing commented 2 years ago

I also met this error, do you know how to fix it now?

vealocia commented 2 years ago

Hi, all! Thanks for your attention in our work. It seems both you two are trying to training CrossVIS with YouTube-VIS 2021 dataset. Please modify self.nID in here to identity numbers of YouTube-VIS 2021 (larger than default 3774), or the identify loss will raise errors due to target indices may out of its bounds. Hope this is helpful to you!

HarryHsing commented 2 years ago

Hi, all! Thanks for your attention in our work. It seems both you two are trying to training CrossVIS with YouTube-VIS 2021 dataset. Please modify self.nID in here to identity numbers of YouTube-VIS 2021 (larger than default 3774), or the identify loss will raise errors due to target indices may out of its bounds. Hope this is helpful to you!

Much appreciated!

Alxx999 commented 2 years ago

大家好!感谢您对我们工作的关注。看起来你们两个都在测试使用 YouTube-VIS 2021 数据集 CrossVIS。请self.nID此处为 YouTube-VIS 2021 的标识号(大于默认 3774)进行修改,否则标识号遗嘱会因您的目标索引可能超出其范围而有错误。希望这对您有帮助!

非常打击! Hi,What did you change to 3774 finally?

HarryHsing commented 2 years ago

大家好!感谢您对我们工作的关注。看起来你们两个都在测试使用 YouTube-VIS 2021 数据集 CrossVIS。请self.nID此处为 YouTube-VIS 2021 的标识号(大于默认 3774)进行修改,否则标识号遗嘱会因您的目标索引可能超出其范围而有错误。希望这对您有帮助!

非常打击! Hi,What did you change to 3774 finally?

You can try 6283, it works for me

Alxx999 commented 2 years ago

大家好!感谢您对我们工作的关注。看起来你们两个都在测试使用 YouTube-VIS 2021 数据集 CrossVIS。请self.nID此处为 YouTube-VIS 2021 的标识号(大于默认 3774)进行修改,否则标识号遗嘱会因您的目标索引可能超出其范围而有错误。希望这对您有帮助!

非常打击! Hi,What did you change to 3774 finally?

You can try 6283, it works for me

Thanks

Alxx999 commented 2 years ago

Hi, all! Thanks for your attention in our work. It seems both you two are trying to training CrossVIS with YouTube-VIS 2021 dataset. Please modify self.nID in here to identity numbers of YouTube-VIS 2021 (larger than default 3774), or the identify loss will raise errors due to target indices may out of its bounds. Hope this is helpful to you!

If I use a Vis dataset I created myself, what should self.nid be set to?

HarryHsing commented 2 years ago

Hi, all! Thanks for your attention in our work. It seems both you two are trying to training CrossVIS with YouTube-VIS 2021 dataset. Please modify self.nID in here to identity numbers of YouTube-VIS 2021 (larger than default 3774), or the identify loss will raise errors due to target indices may out of its bounds. Hope this is helpful to you!

If I use a Vis dataset I created myself, what should self.nid be set to?

It should be the number of all the instances in your own training dataset.

HarryHsing commented 2 years ago

Hi, all! Thanks for your attention in our work. It seems both you two are trying to training CrossVIS with YouTube-VIS 2021 dataset. Please modify self.nID in here to identity numbers of YouTube-VIS 2021 (larger than default 3774), or the identify loss will raise errors due to target indices may out of its bounds. Hope this is helpful to you!

If I use a Vis dataset I created myself, what should self.nid be set to?

check out key ['annotations'] in train.json

Alxx999 commented 2 years ago

Thank you so much