isl-org / Open3D-ML

An extension of Open3D to address 3D Machine Learning tasks
Other
1.74k stars 313 forks source link

Summarize the bug "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!" #510

Open Nikolatesla-lj opened 2 years ago

Nikolatesla-lj commented 2 years ago

Checklist

Describe the issue

Steps to reproduce the bug

1.run the vis_pred.py 
2.terminals prompts RuntimeError:Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Error message

/home/ub/anaconda3/envs/Genv3D/bin/python home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py INFO - 2022-04-07 14:08:39,636 - semantic_segmentation - Loading checkpoint /home/ljian/Downloads/Open3D-ML-master/examples/vis_weights_RandLANet.pth INFO - 2022-04-07 14:08:41,694 - semantic_segmentation - Loading checkpoint /home/ljian/Downloads/Open3D-ML-master/examples/vis_weights_KPFCNN.pth test 0/1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 78416/78726 [00:02<00:00, 33596.46it/s]Traceback (most recent call last): File "/home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py", line 163, in main() File "/home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py", line 151, in main pcs_with_pred = pred_custom_data(pc_names, pcs, pipeline_r, pipeline_k) File "/home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py", line 40, in pred_custom_data results_r = pipeline_r.run_inference(data) File "/home/ub/anaconda3/envs/Genv3D/lib/python3.8/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 172, in run_inference valid_scores, valid_labels = filter_valid_label( File "/home/ub/anaconda3/envs/Genv3D/lib/python3.8/site-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py", line 19, in filter_valid_label valid_scores = torch.gather(valid_scores, 0, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! No response

Expected behavior

No response

Open3D, Python and System information

- Operating system: (Ubuntu 20.04)
- Python version: (e.g. Python 3.8)
- Open3D version: (open3d_ML 0.15.1)
- Is this remote workstation?: no
- How did you install Open3D?: (conda)

Additional information

No response

conby commented 2 years ago

Same here,

Open3D, Python and System information

$ python3 vis_pred.py Open3D was not compiled with BUILD_GUI, but script is importing open3d.visualization.gui Open3D was not compiled with BUILD_GUI, but script is importing open3d.visualization.rendering


Using the Open3D PyTorch ops with CUDA 11 may have stability issues!

We recommend to compile PyTorch from source with compile flags '-Xcompiler -fno-gnu-unique'

or use the PyTorch wheels at https://github.com/isl-org/open3d_downloads/releases/tag/torch1.8.2

Ignore this message if PyTorch has been compiled with the aforementioned flags.

See https://github.com/isl-org/Open3D/issues/3324 and https://github.com/pytorch/pytorch/issues/52663 for more information on this problem.


INFO - 2022-04-26 23:32:03,045 - semantic_segmentation - Loading checkpoint /home/x/work/Open3D-ML/examples/vis_weights_RandLANet.pth INFO - 2022-04-26 23:32:18,718 - semantic_segmentation - Loading checkpoint /home/x/work/Open3D-ML/examples/vis_weights_KPFCNN.pth test 0/1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 78711/78726 [00:05<00:00, 15310.08it/s]Traceback (most recent call last): File "vis_pred.py", line 167, in main() File "vis_pred.py", line 155, in main pcs_with_pred = pred_custom_data(pc_names, pcs, pipeline_r, pipeline_k) File "vis_pred.py", line 40, in pred_custom_data results_r = pipeline_r.run_inference(data) File "/home/x/.local/lib/python3.6/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 175, in run_inference model.cfg.ignored_label_inds, device) File "/home/x/.local/lib/python3.6/site-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py", line 20, in filter_valid_label valid_idx.unsqueeze(-1).expand(-1, num_classes)) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_gather) test 0/1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 78726/78726 [00:13<00:00, 5852.46it/s]

lovelyyoshino commented 2 years ago

I can see the bug in

        metric = SemSegMetric()
        valid_scores, valid_labels = filter_valid_label(
            torch.tensor(inference_result['predict_scores']).to(device),
            torch.tensor(data['label']), model.cfg.num_classes,
            model.cfg.ignored_label_inds, device)
        metric.update(valid_scores, valid_labels)
        log.info(f"Accuracy : {metric.acc()}")
        log.info(f"IoU : {metric.iou()}")

which can see in https://github.com/isl-org/Open3D-ML/issues/435

RuiMargarido commented 8 months ago

We have had this error while training PointTransformer on a CustomDataset. We managed to fix this, by changing run_inference in semantic_segmentation.py.

We changed line 161, to replicate what is going on in run_test(), ie: Add

if hasattr(inputs['data'], 'to'):
    inputs['data'].to(device)

before

results = model(inputs['data'])
anuzk13 commented 3 months ago

Thank you @RuiMargarido this worked for me!