How to resolve “RuntimeError: CUDA error: device-side assert triggered”?

Hi qiaoguan, The GitHub project called “Person-reid-GAN-pytorch” is very interested to me. I followed the steps on README.md file, also I have downloaded the dataset “Market-1501”. However, when I execute the command “python train_baseline.py –use_dense”, and I modified the codes due to only one GPU is owned by me, 100 101 102

the following errors have appeared:

"12936 751 /home/jenniferwu/Documents/Python_project/Person-reid-GAN-pytorch-master/model.py:14: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaimingnormal. init.kaiming_normal(m.weight.data, a=0, mode='fan_out') /home/jenniferwu/Documents/Pythonproject/Person-reid-GAN-pytorch-master/model.py:15: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant. init.constant(m.bias.data, 0.0) /home/jenniferwu/Documents/Pythonproject/Person-reid-GAN-pytorch-master/model.py:17: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal. init.normal(m.weight.data, 1.0, 0.02) /home/jenniferwu/Documents/Pythonproject/Person-reid-GAN-pytorch-master/model.py:18: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant. init.constant(m.bias.data, 0.0) /home/jenniferwu/Documents/Pythonproject/Person-reid-GAN-pytorch-master/model.py:23: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal. init.normal(m.weight.data, std=0.001) /home/jenniferwu/Documents/Pythonproject/Person-reid-GAN-pytorch-master/model.py:24: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant. init.constant(m.bias.data, 0.0) Epoch 0/12 /root/anaconda3/lib/python3.7/site-packages/torch/optim/lr_scheduler.py:136: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning) train_baseline.py:165: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. flos=F.log_softmax(input) # NK? batchsize751 train_baseline.py:167: UserWarning: Implicit dimension choice for log_softmax has been deprecated. Change the call to include dim=X as an argument. logpt=F.log_softmax(input) # size: batchsize*751

Traceback (most recent call last): File "train_baseline.py", line 349, in num_epochs=13) File "train_baseline.py", line 251, in train_model loss.backward() File "/root/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 221, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/root/anaconda3/lib/python3.7/site-packages/torch/autograd/init.py", line 132, in backward allow_unreachable=True) # allow_unreachable flag RuntimeError: CUDA error: device-side assert triggered"

So, would you pls help to give me some suggestions on how to resolve “RuntimeError: CUDA error: device-side assert triggered”? Many Thanks!

qiaoguan / Person-reid-GAN-pytorch

How to resolve “RuntimeError: CUDA error: device-side assert triggered”? #27