nerfstudio-project / nerfstudio

A collaboration friendly studio for NeRFs
https://docs.nerf.studio
Apache License 2.0
8.87k stars 1.18k forks source link

Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! #3222

Closed ichsan2895 closed 2 weeks ago

ichsan2895 commented 2 weeks ago
# erase the previous version
pip uninstall gsplat nerfstudio -y

# install the latest source as June, 13rd 2024
pip install git+https://github.com/nerfstudio-project/gsplat.git@d01e6c0561f5c51d4372ff6b1d3c45f7b1e28fd5

pip install git+https://github.com/nerfstudio-project/nerfstudio.git@290f3fa028dc21da1a00dd9684e1a36c5361452f

Additional info:

Ubuntu 22.04.3 LTS
Python 3.10.12
Torch 2.0.1+cu118
RTX A6000 48GB VRAM

Error logs

Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0465              
VanillaPipeline.get_train_loss_dict: 0.0385              
VanillaPipeline.get_eval_image_metrics_and_images: 0.0135              
Trainer.eval_iteration: 0.0001              
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 247, in main
    launch(
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/engine/trainer.py", line 298, in train
    self.eval_iteration(step)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/utils/decorators.py", line 70, in wrapper
    ret = func(self, *args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/engine/trainer.py", line 545, in eval_iteration
    metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 340, in get_eval_image_metrics_and_images
    outputs = self.model.get_outputs_for_camera(camera)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/models/splatfacto.py", line 906, in get_outputs_for_camera
    outs = self.get_outputs(camera.to(self.device))
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/models/splatfacto.py", line 789, in get_outputs
    rgb = render[:, ..., :3] + (1 - alpha) * background
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
Traceback (most recent call last):
  File "/usr/local/bin/ns-train", line 8, in <module>
    sys.exit(entrypoint())
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 262, in entrypoint
    main(
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 247, in main
    launch(
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 189, in launch
    main_func(local_rank=0, world_size=world_size, config=config)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/scripts/train.py", line 100, in train_loop
    trainer.train()
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/engine/trainer.py", line 298, in train
    self.eval_iteration(step)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/utils/decorators.py", line 70, in wrapper
    ret = func(self, *args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/engine/trainer.py", line 545, in eval_iteration
    metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/utils/profiler.py", line 112, in inner
    out = func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/pipelines/base_pipeline.py", line 340, in get_eval_image_metrics_and_images
    outputs = self.model.get_outputs_for_camera(camera)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/models/splatfacto.py", line 906, in get_outputs_for_camera
    outs = self.get_outputs(camera.to(self.device))
  File "/workspace/NERFSTUDIO_GSPLAT1/nerfstudio/nerfstudio/models/splatfacto.py", line 789, in get_outputs
    rgb = render[:, ..., :3] + (1 - alpha) * background
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

It worked if I used previous version of nerfstudio==1.1.0 and gsplat==0.1.12

ichsan2895 commented 2 weeks ago

Fixed with this PR Small fix when evaluating splatfacto model, running ns-eval. Please merge it.

maturk commented 2 weeks ago

Fixed in #3219