justanhduc / graphx-conv

Official implementation of GraphX-Convolution
https://justanhduc.github.io/2019/09/29/GraphX-Convolution.html
MIT License
62 stars 17 forks source link

RuntimeError: xyz1 must be a CUDA tensor #11

Closed nhy17-thu closed 3 years ago

nhy17-thu commented 3 years ago

Hi @justanhduc, I've successfully installed all the requirements including the latest Cuda version of your neuralnet-pytorch package. However, when I run python train.py configs/lowrankgraphx-up-final.gin --gpu 0, the following error message comes out:

2021-03-15 15:13:44,530 [MainThread ] [INFO ] Result folder: results/ICCV-lowrankgraphx-conv-up-final/run-5 Training... Traceback (most recent call last): File "train.py", line 86, in train_valid() File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/gin/config.py", line 1069, in gin_wrapper utils.augment_exception_message_and_reraise(e, err_str) File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/gin/utils.py", line 41, in augment_exception_message_and_reraise raise proxy.with_traceback(exception.traceback) from None File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/gin/config.py", line 1046, in gin_wrapper return fn(*new_args, new_kwargs) File "train.py", line 80, in train_valid monitor.run_training(net, solver, train_loader, n_epochs, scheduler=scheduler, eval_loader=val_loader, valid_freq=val_freq, reduce='mean') File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/neuralnet_pytorch/monitor.py", line 928, in run_training loss = net.train_procedure(batch, *args, *kwargs) File "/mnt/c/Users/yydyz/OneDrive/毕业设计/graphx-conv/src/networks.py", line 300, in train_procedure loss = self.get_loss(batch, reduce, normalized) File "/mnt/c/Users/yydyz/OneDrive/毕业设计/graphx-conv/src/networks.py", line 295, in get_loss loss = sum([normalized_chamfer_loss(pred[None], gt[None], reduce=reduce, normalized=normalized) for pred, gt in zip(pred_pc, gt_pc)]) / len( File "/mnt/c/Users/yydyz/OneDrive/毕业设计/graphx-conv/src/networks.py", line 295, in loss = sum([normalized_chamfer_loss(pred[None], gt[None], reduce=reduce, normalized=normalized) for pred, gt in zip(pred_pc, gt_pc)]) / len( File "/mnt/c/Users/yydyz/OneDrive/毕业设计/graphx-conv/src/networks.py", line 22, in normalized_chamfer_loss loss = nnt.chamfer_loss(pred, gt, reduce=reduce) File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/neuralnet_pytorch/metrics.py", line 118, in chamfer_loss dist1, dist2 = chamfer_distance(xyz1, xyz2) File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/neuralnet_pytorch/extensions/dist_chamfer.py", line 37, in forward return ChamferFunction.apply(input1, input2) File "/home/niuhaoyu/anaconda3/envs/graphX-cuda/lib/python3.8/site-packages/neuralnet_pytorch/extensions/dist_chamfer.py", line 19, in forward dist1, dist2, idx1, idx2 = ext.chamfer_forward(xyz1, xyz2) RuntimeError: xyz1 must be a CUDA tensor In call to configurable 'GraphX' (<function train_valid at 0x7f9b586fd820>) 2021-03-15 15:13:46,257 [Thread-1 ] [INFO ] Elapsed time 0.25mins run-5 Epoch 1 Iteration 0/210001 (0.00%)

My environment:

cudatoolkit 11.0 pytorch 1.7.1 neuralnet-pytorch 1.0.0+fancy.166.gcbb0c5a python 3.8.8

Meanwhile, training with CPU works well while too slow. Could you please find out the reason for this error and help me solve it? Thanks!

justanhduc commented 3 years ago

Hi @nhy17-thu. The error says that xyz1 must be a CUDA tensor. Could you check what device the pred and gt are on?

nhy17-thu commented 3 years ago

Hi @nhy17-thu. The error says that xyz1 must be a CUDA tensor. Could you check what device the pred and gt are on?

Hi Duc! Sorry for the late response, but I have solved the problem by simply changing one line in train.py:

mon.run_training(net, solver, train_loader, n_epochs, scheduler=scheduler, eval_loader=val_loader, valid_freq=val_freq, reduce='mean')

to

mon.run_training(net, solver, train_loader, n_epochs, scheduler=scheduler, eval_loader=val_loader, valid_freq=val_freq, reduce='mean', device='cuda')

Which tells the monitor to use the cuda device. Maybe this could also be updated in your own code by adding an if statement with nnt.cuda_available to select device automatically in code :)

Anyway, the code can run successfully now, and thank you so much for your help!

ShoushuangPei commented 3 years ago

I have meet the same problem,and add ‘device=cuda’ can solve the problem,thanks very much.

zhangyahu1 commented 3 years ago

It seems the code does not support multi-GPUs.