I have a problem to start the training of this work. I've checked Detectron2 is installed correctly as I can train the official examples. The MaskFormer op is built according to this. Could you please release the environment requirement along with the running instructions. It would be even better if you can give some advice on addessing the following issue.
When running training with this model, I encountered an issue with the following error message
cu11.7+torch1.31
/opt/conda/conda-bld/pytorch_1670525552843/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [2,0,0], thread: [79,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
...
ERROR [07/15 16:06:35 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
...
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/maskformer.py", line 356, in forward
losses = self.criterion(outputs, targets, depths)
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/criterion.py", line 555, in forward
indices = self.matcher(outputs_without_aux, targets, depths)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/matcher.py", line 324, in forward
return self.memory_efficient_forward(outputs, targets, depths)
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/matcher.py", line 250, in memory_efficient_forward
point_coords = torch.rand(1, self.num_points, 2, device=out_mask.device)
RuntimeError: CUDA error: device-side assert triggered
cu12.1+torch2.31
/opt/conda/conda-bld/pytorch_1716905979055/work/aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [0,0,0], thread: [90,0,0] Assertion `-sizes[i] <= index && index < sizes[i] && "index out of bounds"` failed.
...
ERROR [07/15 16:20:00 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/maskformer.py", line 356, in forward
losses = self.criterion(outputs, targets, depths)
...
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/criterion.py", line 555, in forward
indices = self.matcher(outputs_without_aux, targets, depths)
...
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/matcher.py", line 324, in forward
return self.memory_efficient_forward(outputs, targets, depths)
File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/test/ws/panoptic/depth/image-based/DeepDPS/model/maskformer/matcher.py", line 201, in memory_efficient_forward
cost_class = -out_prob[:, tgt_ids]
RuntimeError: CUDA error: device-side assert triggered
Hi, thank you for the impressive work!
I have a problem to start the training of this work. I've checked Detectron2 is installed correctly as I can train the official examples. The MaskFormer op is built according to this. Could you please release the environment requirement along with the running instructions. It would be even better if you can give some advice on addessing the following issue.
When running training with this model, I encountered an issue with the following error message
cu11.7+torch1.31
cu12.1+torch2.31