Following Example, but RuntimeError: Expected all tensors to be on the same device

JaeLee18 commented 1 year ago

Hello,

I installed the requirements from a fresh system but I am getting the runtime error: I am using Cuda 11.8, Python3.0 PyTorch 2.0.1+cu118

Global seed set to 0
[INFO] ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] IPU available: False, using: 0 IPUs
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA A10') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[INFO] 
  | Name       | Type                           | Params
--------------------------------------------------------------
0 | geometry   | ImplicitVolume                 | 12.6 M
1 | material   | DiffuseWithPointLightMaterial  | 0     
2 | background | NeuralEnvironmentMapBackground | 448   
3 | renderer   | NeRFVolumeRenderer             | 0     
--------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.419    Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/dreamfusion-sd/a_zoomed_out_DSLR_photo_of_a_baby_bunny_sitting_on_top_of_a_stack_of_pancakes@20230717-061936/save
[INFO] Using prompt [a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes] and negative prompt []
[INFO] Using view-dependent prompts [side]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, side view] [front]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, front view] [back]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, back view] [overhead]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, overhead view]
Traceback (most recent call last):
  File "/home/ubuntu/threestudio/launch.py", line 237, in <module>
    main(args, extras)
  File "/home/ubuntu/threestudio/launch.py", line 180, in main
    trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 529, in fit
    call._call_and_handle_interrupt(
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 42, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 568, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 954, in _run
    call._call_lightning_module_hook(self, "on_fit_start")
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/pytorch_lightning/trainer/call.py", line 144, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/ubuntu/threestudio/threestudio/systems/dreamfusion.py", line 32, in on_fit_start
    self.prompt_processor = threestudio.find(self.cfg.prompt_processor_type)(
  File "/home/ubuntu/threestudio/threestudio/utils/base.py", line 63, in __init__
    self.configure(*args, **kwargs)
  File "/home/ubuntu/threestudio/threestudio/models/prompt_processors/base.py", line 335, in configure
    self.prepare_text_embeddings()
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/lightning_utilities/core/rank_zero.py", line 32, in wrapped_fn
    return fn(*args, **kwargs)
  File "/home/ubuntu/threestudio/threestudio/models/prompt_processors/base.py", line 382, in prepare_text_embeddings
    self.spawn_func(
  File "/home/ubuntu/threestudio/threestudio/models/prompt_processors/stable_diffusion_prompt_processor.py", line 91, in spawn_func
    text_embeddings = text_encoder(tokens.input_ids)[0]
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 822, in forward
    return self.text_model(
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 730, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/transformers/models/clip/modeling_clip.py", line 227, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/home/ubuntu/miniconda3/envs/torch118/lib/python3.9/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

thuliu-yt16 commented 1 year ago

What is your running command?

joeyeschner commented 1 year ago

Hi,

I'm using the docker version and I'm getting the same RuntimeError on fantasia3d, ProlificDreamer, and dreamfusion-sd. dreamfusion-if and magic3d work as expected, the rest I have not tried yet.

I used the example prompts (e.g. dreamer@b7f0adb5acdb:~/threestudio$ python launch.py --config configs/fantasia3d.yaml --train --gpu 0 system.prompt_processor.prompt="a DSLR photo of an ice cream sundae" system.renderer.context_type=cuda) with the addition of setting the renderer to cuda as docker does not support the opengl one.

rotabulo commented 1 year ago

Hi, I fixed the issue by replacing https://github.com/threestudio-project/threestudio/blob/main/threestudio/models/prompt_processors/stable_diffusion_prompt_processor.py#L91 with text_embeddings = text_encoder(tokens.input_ids.to(text_encoder.device))[0]

joeyeschner commented 1 year ago

Thanks a lot that works!

thuliu-yt16 commented 1 year ago

I will check whether this also shows in the normal terminal. I guess it may be related to docker environment.

santisy commented 1 year ago

I also encountered this issue. And it was in normal terminal. But only with newly created enviroment (with the same code). So it is quite strange.

mrtpk commented 1 year ago

I had the same issue in the docker provided. The fix by @rotabulo worked.

thuliu-yt16 commented 1 year ago

Finally, I could reproduce the error after updating the module:

transformers-4.28.1 -> 4.31.0

along with some update for compatibility:

huggingface-hub-0.13.4 -> 0.16.4
safetensors-0.3.1 newly installed
accelerate-0.18.0 -> 0.21.0

will fix in the next commit. It seems that tramsformers has changed some related codes.

Thank you @rotabulo for the solution and thank you all to find the stupid bug.

thuliu-yt16 commented 1 year ago

Should fixed in #258. We also recommend you use transformers==4.28.1 to avoid potential errors when using DeepFloyd.

threestudio-project / threestudio

Following Example, but RuntimeError: Expected all tensors to be on the same device #236