threestudio-project / threestudio

A unified framework for 3D content generation.
Apache License 2.0
5.91k stars 457 forks source link

ProlificDreamer pipeline crashes on mixed tensor devices #249

Open y22ma opened 11 months ago

y22ma commented 11 months ago

Tried to run the prolicifcdreamer.yaml config via the following command:

python launch.py --train --config=./configs/prolificdreamer.yaml --gpu=0 name=prolificdreamer system.prompt_processor.prompt="a fluffy sheep doll sitting on the floor"

And I got the following error:

Traceback (most recent call last):
  File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/threestudio/threestudio/models/prompt_processors/stable_diffusion_prompt_processor.py", line 94, in spawn_func
    text_embeddings = text_encoder(tokens.input_ids)[0]
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 823, in forward
    return self.text_model(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 731, in forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/transformers/models/clip/modeling_clip.py", line 229, in forward
    inputs_embeds = self.token_embedding(input_ids)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 162, in forward
    return F.embedding(
  File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 2210, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)

Diving deeper into the stable_diffusion_prompt_processor class, I confirmed that tokens.input_ids sits on the cpu. By changing the erroneous line to

device = "cuda" if torch.cuda.is_available() else "cpu"
text_embeddings = text_encoder(tokens.input_ids.to(device)[0]

the issue goes away. Is this the right fix?

bennyguo commented 11 months ago

Thanks for reporting! I think this is related to https://github.com/threestudio-project/threestudio/issues/236. Will check soon.