After line 162 in fast_rcnn.py which runs the following code:
roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype)
I get the following CUDA Memory error with the bounding boxes tensor when trying to print it out (the same error occurs later on in the code on the first access to the variable boxes, but I pinpointed that after line 162 runs this error starts happening):
Traceback (most recent call last):
File "vqa/mytrain_end2end.py", line 65, in <module>
main()
File "vqa/mytrain_end2end.py", line 57, in main
rank, model = train_net(args, config)
File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../vqa/function/mytrain.py", line 336, in train_net
gradient_accumulate_steps=config.TRAIN.GRAD_ACCUMULATE_STEPS)
File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/trainer.py", line 115, in train
outputs, loss = net(*batch)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/module.py", line 22, in forward
return self.train_forward(*inputs, **kwargs)
File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../vqa/modules/myresnet_vlbert_for_vqa.py", line 203, in train_forward
segms=None)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/gabriel/Desktop/Toby/VL-BERT/vqa/../common/fast_rcnn.py", line 163, in forward
print("3", boxes)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/tensor.py", line 179, in __repr__
return torch._tensor_str._str(self)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 372, in _str
return _str_intern(self)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 352, in _str_intern
tensor_str = _tensor_str(self, indent)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 241, in _tensor_str
formatter = _Formatter(get_summarized_data(self) if summarize else self)
File "/home/gabriel/anaconda3/envs/vl-bert/lib/python3.6/site-packages/torch/_tensor_str.py", line 89, in __init__
nonzero_finite_vals = torch.masked_select(tensor_view, torch.isfinite(tensor_view) & tensor_view.ne(0))
RuntimeError: CUDA error: an illegal memory access was encountered
After line 162 in fast_rcnn.py which runs the following code:
roi_align_res = self.roi_align(img_feats['body4'], rois).type(images.dtype)
I get the following CUDA Memory error with the bounding boxes tensor when trying to print it out (the same error occurs later on in the code on the first access to the variable boxes, but I pinpointed that after line 162 runs this error starts happening): Traceback (most recent call last):
My environment is as follows:
I wonder if there is a different ROI align I could use instead, or ways to get around this issue. Thanks for the help.