lvwj19 / PPR-Net-plus

PPR-Net++: Accurate 6D Pose Estimation in Stacked Scenarios
Apache License 2.0
36 stars 6 forks source link

RuntimeError: arguments are located on different GPUs #20

Open gyc137 opened 5 months ago

gyc137 commented 5 months ago

When I tried to train the model, I came across this problem. -RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:481 I dont understand why would it be,hoping you can help me solve this.

And here is the detail error: Traceback (most recent call last): File "train.py", line 240, in train(start_epoch) File "train.py", line 231, in train train_one_epoch(train_loader) File "train.py", line 134, in train_one_epoch pred_results, losses = net(inputs) File "/home/lg/anaconda3/envs/pprnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(*input, kwargs) File "/home/lg/anaconda3/envs/pprnet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 152, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/home/lg/anaconda3/envs/pprnet/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 162, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/home/lg/anaconda3/envs/pprnet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/home/lg/anaconda3/envs/pprnet/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, *kwargs) File "/home/lg/anaconda3/envs/pprnet/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call result = self.forward(input, kwargs) File "/home/lg/pprnet/PPR-Net-plus/pprnet/pprnet.py", line 147, in forward losses = self._compute_loss(pred_results_flatten, inputs['labels']) File "/home/lg/pprnet/PPR-Net-plus/pprnet/pprnet.py", line 196, in _compute_loss vis_label_flatten) File "/home/lg/pprnet/PPR-Net-plus/pprnet/pose_loss.py", line 166, in rot_loss rtn = self._rot_loss_finite(rot_matrix, rot_label, self.lambda_p, self.G, weight, return_pointwise_loss) File "/home/lg/pprnet/PPR-Net-plus/pprnet/pose_loss.py", line 241, in _rot_loss_finite P = torch.matmul(rot_matrix, lambda_p) # (M, 3, 3) RuntimeError: arguments are located on different GPUs at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:481

gyc137 commented 5 months ago

It is normal when I use only one GPU instead of all 4 GPUs.