Summarize the bug "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!"

Nikolatesla-lj commented 2 years ago

Checklist

[X] I have searched for similar issues.
[X] I have tested with the latest development wheel.
[X] I have checked the release documentation and the latest documentation (for master branch).

Describe the issue

Steps to reproduce the bug

1.run the vis_pred.py 
2.terminals prompts RuntimeError:Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

Error message

/home/ub/anaconda3/envs/Genv3D/bin/python home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py INFO - 2022-04-07 14:08:39,636 - semantic_segmentation - Loading checkpoint /home/ljian/Downloads/Open3D-ML-master/examples/vis_weights_RandLANet.pth INFO - 2022-04-07 14:08:41,694 - semantic_segmentation - Loading checkpoint /home/ljian/Downloads/Open3D-ML-master/examples/vis_weights_KPFCNN.pth test 0/1: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▍| 78416/78726 [00:02<00:00, 33596.46it/s]Traceback (most recent call last): File "/home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py", line 163, in main() File "/home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py", line 151, in main pcs_with_pred = pred_custom_data(pc_names, pcs, pipeline_r, pipeline_k) File "/home/ub/Downloads/Open3D-ML-master/examples/vis_pred.py", line 40, in pred_custom_data results_r = pipeline_r.run_inference(data) File "/home/ub/anaconda3/envs/Genv3D/lib/python3.8/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 172, in run_inference valid_scores, valid_labels = filter_valid_label( File "/home/ub/anaconda3/envs/Genv3D/lib/python3.8/site-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py", line 19, in filter_valid_label valid_scores = torch.gather(valid_scores, 0, RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! No response

Expected behavior

No response

Open3D, Python and System information

- Operating system: (Ubuntu 20.04)
- Python version: (e.g. Python 3.8)
- Open3D version: (open3d_ML 0.15.1)
- Is this remote workstation?: no
- How did you install Open3D?: (conda)

Additional information

No response

conby commented 2 years ago

Same here,

Open3D, Python and System information

Operating system: (Ubuntu 18.04, Nvidia Jetson/aarch64)
Python version: (e.g. Python 3.6)
Open3D version: (open3d_ML 0.15.1)
Is this remote workstation?: no
How did you install Open3D?: (build from source)

$ python3 vis_pred.py Open3D was not compiled with BUILD_GUI, but script is importing open3d.visualization.gui Open3D was not compiled with BUILD_GUI, but script is importing open3d.visualization.rendering

Using the Open3D PyTorch ops with CUDA 11 may have stability issues!

We recommend to compile PyTorch from source with compile flags '-Xcompiler -fno-gnu-unique'

or use the PyTorch wheels at https://github.com/isl-org/open3d_downloads/releases/tag/torch1.8.2

Ignore this message if PyTorch has been compiled with the aforementioned flags.

See https://github.com/isl-org/Open3D/issues/3324 and https://github.com/pytorch/pytorch/issues/52663 for more information on this problem.

INFO - 2022-04-26 23:32:03,045 - semantic_segmentation - Loading checkpoint /home/x/work/Open3D-ML/examples/vis_weights_RandLANet.pth INFO - 2022-04-26 23:32:18,718 - semantic_segmentation - Loading checkpoint /home/x/work/Open3D-ML/examples/vis_weights_KPFCNN.pth test 0/1: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▉| 78711/78726 [00:05<00:00, 15310.08it/s]Traceback (most recent call last): File "vis_pred.py", line 167, in main() File "vis_pred.py", line 155, in main pcs_with_pred = pred_custom_data(pc_names, pcs, pipeline_r, pipeline_k) File "vis_pred.py", line 40, in pred_custom_data results_r = pipeline_r.run_inference(data) File "/home/x/.local/lib/python3.6/site-packages/open3d/_ml3d/torch/pipelines/semantic_segmentation.py", line 175, in run_inference model.cfg.ignored_label_inds, device) File "/home/x/.local/lib/python3.6/site-packages/open3d/_ml3d/torch/modules/losses/semseg_loss.py", line 20, in filter_valid_label valid_idx.unsqueeze(-1).expand(-1, num_classes)) RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper_gather) test 0/1: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 78726/78726 [00:13<00:00, 5852.46it/s]

lovelyyoshino commented 2 years ago

I can see the bug in

        metric = SemSegMetric()
        valid_scores, valid_labels = filter_valid_label(
            torch.tensor(inference_result['predict_scores']).to(device),
            torch.tensor(data['label']), model.cfg.num_classes,
            model.cfg.ignored_label_inds, device)
        metric.update(valid_scores, valid_labels)
        log.info(f"Accuracy : {metric.acc()}")
        log.info(f"IoU : {metric.iou()}")

which can see in https://github.com/isl-org/Open3D-ML/issues/435

RuiMargarido commented 1 year ago

We have had this error while training PointTransformer on a CustomDataset. We managed to fix this, by changing run_inference in semantic_segmentation.py.

We changed line 161, to replicate what is going on in run_test(), ie: Add

if hasattr(inputs['data'], 'to'):
    inputs['data'].to(device)

before

results = model(inputs['data'])

anuzk13 commented 8 months ago

Thank you @RuiMargarido this worked for me!

isl-org / Open3D-ML