THCudaCheck Fail illegal memory access

austinmw commented 4 years ago

Hi, I'm trying to run this on EC2, so I modified demo.py to remove im.show and im.waitKey. When running python demo_no_output.py tracking,ddd --load_model ../models/nuScenes_3Dtracking.pth --dataset nuscenes --pre_hm --track_thresh 0.1 --demo ../videos/nuscenes_mini.mp4 --save_video

I get the following output/error:

Running tracking
Using tracking threshold for out threshold! 0.1
Fix size testing.
training chunk_sizes: [32]
input h w: 448 800
heads {'hm': 10, 'reg': 2, 'wh': 2, 'tracking': 2, 'dep': 1, 'rot': 8, 'dim': 3, 'amodel_offset': 2}
weights {'hm': 1, 'reg': 1, 'wh': 0.1, 'tracking': 1, 'dep': 1, 'rot': 1, 'dim': 1, 'amodel_offset': 1}
head conv {'hm': [256], 'reg': [256], 'wh': [256], 'tracking': [256], 'dep': [256], 'rot': [256], 'dim': [256], 'amodel_offset': [256]}
Creating model...
Using node type: (<class 'model.networks.dla.DeformConv'>, <class 'model.networks.dla.DeformConv'>)
Warning: No ImageNet pretrain!!
loaded ../models/nuScenes_3Dtracking.pth, epoch 70
OpenCV: FFMPEG: tag 0x44495658/'XVID' is not supported with codec id 12 and format 'mp4 / MP4 (MPEG-4 Part 14)'
OpenCV: FFMPEG: fallback to use tag 0x7634706d/'mp4v'
Skip imshow
Initialize tracking!
error in modulated_deformable_im2col_cuda: invalid device function
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=700 : an illegal memory access was encountered
Traceback (most recent call last):
  File "demo_no_output.py", line 119, in <module>
    demo(opt)
  File "demo_no_output.py", line 65, in demo
    ret = detector.run(img, input_meta)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/detector.py", line 102, in run
    images, self.pre_images, pre_hms, pre_inds, return_time=True)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/detector.py", line 301, in process
    output = self.model(images, pre_images, pre_hms)[-1]
  File "/home/ec2-user/anaconda3/envs/CenterTrack/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/model/networks/base_model.py", line 75, in forward
    feats = self.imgpre2feats(x, pre_img, pre_hm)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/model/networks/dla.py", line 633, in imgpre2feats
    x = self.dla_up(x)
  File "/home/ec2-user/anaconda3/envs/CenterTrack/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/model/networks/dla.py", line 572, in forward
    ida(layers, len(layers) -i - 2, len(layers))
  File "/home/ec2-user/anaconda3/envs/CenterTrack/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/model/networks/dla.py", line 545, in forward
    layers[i] = node(layers[i] + layers[i - 1])
  File "/home/ec2-user/anaconda3/envs/CenterTrack/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/model/networks/dla.py", line 516, in forward
    x = self.conv(x)
  File "/home/ec2-user/anaconda3/envs/CenterTrack/lib/python3.6/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ec2-user/centertrack/CenterTrack/src/lib/model/networks/DCNv2/dcn_v2.py", line 121, in forward
    offset = torch.cat((o1, o2), dim=1)
RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /opt/conda/conda-bld/pytorch_1579022034529/work/aten/src/THC/THCCachingHostAllocator.cpp:278
[1]    18140 segmentation fault  python demo_no_output.py tracking,ddd --load_model  --dataset nuscenes   0.1

I'm using Python 3.6.10, followed install directions and built DCN with the make.sh file. Found same error on DCN issues: https://github.com/CharlesShang/DCNv2/issues/35 Any help greatly appreciated!

Tried this with two different EC2 instances and got same error.

Amazon Linux 2 AMI with 4x V100's
Ubuntu 18.04 with 1 K80

acdart commented 4 years ago

u should compile the DCN with a higher vision of gcc, like 5.x

austinmw commented 4 years ago

Hmm think I compiled with gcc 7

yinhai86924 commented 4 years ago

this my gcc version 5.4.0 (Ubuntu 5.4.0-6ubuntu1~16.04.12) but still have this problem ： RuntimeError: cuda runtime error (700) : an illegal memory access was encountered at /tmp/pip-req-build-808afw3c/aten/src/THC/THCCachingHostAllocator.cpp:278 段错误 (核心已转储)