Open garrett-cn opened 5 years ago
have you solve the problem?my problem is same with you
I'm facing the same error. Any clue on solving this?
I have the same problem.
l also met this problem.
self.depth = Variable(data['depth']).cuda() RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorCopy.c:21
I try to the following methods, and finally succeed to train. l wish these can help you .
1.you can try to check your class of your datasets. l use SUNRGBD datatset but l forgot to change the label_class.
d
大佬,我没懂你的解决方法
CUDA_LAUNCH_BLOCKING=1 python train.py --name nyuv2_VGGdeeplab_depthconv --dataset_mode nyuv2 --flip --scale --crop --colorjitter --depthconv --list dataset/sunrgbd_training.lst ------------ Options ------------- batchSize: 1 beta1: 0.5 checkpoints_dir: ./checkpoints colorjitter: True continue_train: False crop: True dataroot: dataset_mode: nyuv2 debug: False decoder: psp_bilinear depthconv: True depthglobalpool: False display_freq: 100 display_winsize: 512 encoder: resnet50_dilated8 fineSize: [480, 640] flip: True gpu_ids: [0] inputmode: bgr-mean isTrain: True iterSize: 10 label_nc: 40 list: dataset/sunrgbd_training.lst loadfroms: False lr: 0.00025 lr_power: 0.9 max_dataset_size: inf maxbatchsize: -1 model: DeeplabVGG momentum: 0.9 nThreads: 1 name: nyuv2_VGGdeeplab_depthconv nepochs: 100 no_html: False phase: train pretrained_model: pretrained_model_HHA: pretrained_model_rgb: print_freq: 100 save_epoch_freq: 10 save_latest_freq: 1000 scale: True serial_batches: False tf_log: False use_softmax: False vallist: verbose: False warmup_iters: 500 wd: 0.0004 which_epoch: latest which_epoch_HHA: latest which_epoch_rgb: latest -------------- End ---------------- CustomDatasetDataLoader dataset [NYUDataset] was created
training images = 795
model [BaseModel] was created create web directory ./checkpoints/nyuv2_VGGdeeplab_depthconv/web... /home/cgl/miniconda3/envs/torch0.40/lib/python3.6/site-packages/torch/nn/functional.py:1749: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) /home/cgl/code/Git/DepthAwareCNN/models/Deeplab.py:106: UserWarning: invalid index of a 0-dim tensor. This will be an error in PyTorch 0.5. Use tensor.item() to convert a 0-dim tensor to a Python number self.averageloss += [self.loss.data[0]] error in depthconv_col2im: an illegal memory access was encountered THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=26 error=77 : an illegal memory access was encountered Traceback (most recent call last): File "train.py", line 55, in
model.backward(total_steps, opt.nepochs dataset.len() opt.batchSize + 1)
File "/home/cgl/code/Git/DepthAwareCNN/models/Deeplab.py", line 112, in backward
self.loss.backward()
File "/home/cgl/miniconda3/envs/torch0.40/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph)
File "/home/cgl/miniconda3/envs/torch0.40/lib/python3.6/site-packages/torch/autograd/init.py", line 89, in backward
allow_unreachable=True) # allow_unreachable flag
File "/home/cgl/code/Git/DepthAwareCNN/models/ops/depthconv/functions/depthconv.py", line 91, in backward
gradweight = weight.new(*weight.size()).zero()
RuntimeError: cuda runtime error (77) : an illegal memory access was encountered at /pytorch/aten/src/THC/generic/THCTensorMath.cu:26