threestudio-project / threestudio

A unified framework for 3D content generation.
Apache License 2.0
6.32k stars 480 forks source link

Multiple errors, WSL2 Ubuntu #148

Closed Ainaemaet closed 1 year ago

Ainaemaet commented 1 year ago

Hello, all sorts of issues here culminating in a "FileNotFoundError: Text embedding file .threestudio_cache/text_embeddings/380af1c90b3b8ac914fde9dd32b144db.pt for model DeepFloyd/IF-I-XL-v1.0 and prompt [a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes] not found."

Fresh env and install, problem only exists when trying DeepFloyd method. Assuming it has something to do with the authentication but I really have no idea.

(threestudio) username@MYPC:~/foldername/mediagen/threestudio$ python launch.py --config configs/dreamfusion-if.yaml --train --gpu 0 system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"
Global seed set to 0
[INFO] ModelCheckpoint(save_last=True, save_top_k=-1, monitor=None) will duplicate the last checkpoint saved.
[INFO] Using 16bit Automatic Mixed Precision (AMP)
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] IPU available: False, using: 0 IPUs
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA GeForce RTX 4090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[INFO]
  | Name       | Type                           | Params
--------------------------------------------------------------
0 | geometry   | ImplicitVolume                 | 12.6 M
1 | material   | DiffuseWithPointLightMaterial  | 0
2 | background | NeuralEnvironmentMapBackground | 448
3 | renderer   | NeRFVolumeRenderer             | 0
--------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.419    Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/dreamfusion-if/a_zoomed_out_DSLR_photo_of_a_baby_bunny_sitting_on_top_of_a_stack_of_pancakes@20230617-052807/save
[INFO] Using prompt [a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes] and negative prompt []
[INFO] Using view-dependent prompts [side]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, side view] [front]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, front view] [back]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, back view] [overhead]:[a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes, overhead view]

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please run

python -m bitsandbytes

 and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
bin /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
  warn("The installed version of bitsandbytes was compiled without GPU support. "
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/username/anaconda3/envs/threestudio did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/opt/conda/lib'), PosixPath('/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/cv2/../../lib64')}
  warn(msg)
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/cv2/../../lib64:/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/cv2/../../lib64::/opt/conda/lib/ did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('unix')}
  warn(msg)
CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
CUDA exception! Error code: no CUDA-capable device is detected
CUDA exception! Error code: initialization error
CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: No GPU detected! Check your CUDA paths. Proceeding to load CPU-only library...
  warn(msg)
CUDA SETUP: Detected CUDA version 121
CUDA SETUP: Loading binary /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so...
Process SpawnProcess-1:
Traceback (most recent call last):
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/username/foldername/mediagen/threestudio/threestudio/models/prompt_processors/deepfloyd_prompt_processor.py", line 61, in spawn_func
    text_encoder = T5EncoderModel.from_pretrained(
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2881, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3228, in _load_pretrained_model
    new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/modeling_utils.py", line 728, in _load_state_dict_into_meta_model
    set_module_quantized_tensor_to_device(
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/transformers/utils/bitsandbytes.py", line 89, in set_module_quantized_tensor_to_device
    new_value = bnb.nn.Int8Params(new_value, requires_grad=False, **kwargs).to(device)
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 294, in to
    return self.cuda(device)
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/nn/modules.py", line 258, in cuda
    CB, CBt, SCB, SCBt, coo_tensorB = bnb.functional.double_quant(B)
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1987, in double_quant
    row_stats, col_stats, nnz_row_ptr = get_colrow_absmax(
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/functional.py", line 1876, in get_colrow_absmax
    lib.cget_col_row_stats(ptrA, ptrRowStats, ptrColStats, ptrNnzrows, ct.c_float(threshold), rows, cols)
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/home/username/anaconda3/envs/threestudio/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cget_col_row_stats
╭───────────────────── Traceback (most recent call last) ─────────────────────╮
│ /home/username/foldername/mediagen/threestudio/launch.py:180 in        │
│ <module>                                                                    │
│                                                                             │
│   177                                                                       │
│   178                                                                       │
│   179 if __name__ == "__main__":                                            │
│ ❱ 180 │   main()                                                            │
│   181                                                                       │
│                                                                             │
│ /home/username/foldername/mediagen/threestudio/launch.py:164 in main   │
│                                                                             │
│   161 │   │   system.set_resume_status(ckpt["epoch"], ckpt["global_step"])  │
│   162 │                                                                     │
│   163 │   if args.train:                                                    │
│ ❱ 164 │   │   trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume)      │
│   165 │   │   trainer.test(system, datamodule=dm)                           │
│   166 │   elif args.validate:                                               │
│   167 │   │   # manually set epoch and global_step as they cannot be automa │
│                                                                             │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/trainer.py:531 in fit                                      │
│                                                                             │
│    528 │   │   """                                                          │
│    529 │   │   model = _maybe_unwrap_optimized(model)                       │
│    530 │   │   self.strategy._lightning_module = model                      │
│ ❱  531 │   │   call._call_and_handle_interrupt(                             │
│    532 │   │   │   self, self._fit_impl, model, train_dataloaders, val_data │
│    533 │   │   )                                                            │
│    534                                                                      │
│                                                                             │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/call.py:42 in _call_and_handle_interrupt                   │
│                                                                             │
│    39 │   try:                                                              │
│    40 │   │   if trainer.strategy.launcher is not None:                     │
│    41 │   │   │   return trainer.strategy.launcher.launch(trainer_fn, *args │
│ ❱  42 │   │   return trainer_fn(*args, **kwargs)                            │
│    43 │                                                                     │
│    44 │   except _TunerExitException:                                       │
│    45 │   │   _call_teardown_hook(trainer)                                  │
│                                                                             │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/trainer.py:570 in _fit_impl                                │
│                                                                             │
│    567 │   │   │   model_provided=True,                                     │
│    568 │   │   │   model_connected=self.lightning_module is not None,       │
│    569 │   │   )                                                            │
│ ❱  570 │   │   self._run(model, ckpt_path=ckpt_path)                        │
│    571 │   │                                                                │
│    572 │   │   assert self.state.stopped                                    │
│    573 │   │   self.training = False                                        │
│                                                                             │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/trainer.py:956 in _run                                     │
│                                                                             │
│    953 │   │   # hook                                                       │
│    954 │   │   if self.state.fn == TrainerFn.FITTING:                       │
│    955 │   │   │   call._call_callback_hooks(self, "on_fit_start")          │
│ ❱  956 │   │   │   call._call_lightning_module_hook(self, "on_fit_start")   │
│    957 │   │                                                                │
│    958 │   │   _log_hyperparams(self)                                       │
│    959                                                                      │
│                                                                             │
│ /home/username/anaconda3/envs/threestudio/lib/python3.10/site-packages/pytorch_l │
│ ightning/trainer/call.py:140 in _call_lightning_module_hook                 │
│                                                                             │
│   137 │   pl_module._current_fx_name = hook_name                            │
│   138 │                                                                     │
│   139 │   with trainer.profiler.profile(f"[LightningModule]{pl_module.__cla │
│ ❱ 140 │   │   output = fn(*args, **kwargs)                                  │
│   141 │                                                                     │
│   142 │   # restore current_fx when nested context                          │
│   143 │   pl_module._current_fx_name = prev_fx_name                         │
│                                                                             │
│ /home/username/foldername/mediagen/threestudio/threestudio/systems/dre │
│ amfusion.py:32 in on_fit_start                                              │
│                                                                             │
│    29 │   def on_fit_start(self) -> None:                                   │
│    30 │   │   super().on_fit_start()                                        │
│    31 │   │   # only used in training                                       │
│ ❱  32 │   │   self.prompt_processor = threestudio.find(self.cfg.prompt_proc │
│    33 │   │   │   self.cfg.prompt_processor                                 │
│    34 │   │   )                                                             │
│    35 │   │   self.guidance = threestudio.find(self.cfg.guidance_type)(self │
│                                                                             │
│ /home/username/foldername/mediagen/threestudio/threestudio/utils/base. │
│ py:63 in __init__                                                           │
│                                                                             │
│   60 │   │   super().__init__()                                             │
│   61 │   │   self.cfg = parse_structured(self.Config, cfg)                  │
│   62 │   │   self.device = get_device()                                     │
│ ❱ 63 │   │   self.configure(*args, **kwargs)                                │
│   64 │                                                                      │
│   65 │   def configure(self, *args, **kwargs) -> None:                      │
│   66 │   │   pass                                                           │
│                                                                             │
│ /home/username/foldername/mediagen/threestudio/threestudio/models/prom │
│ pt_processors/base.py:336 in configure                                      │
│                                                                             │
│   333 │   │   ]                                                             │
│   334 │   │                                                                 │
│   335 │   │   self.prepare_text_embeddings()                                │
│ ❱ 336 │   │   self.load_text_embeddings()                                   │
│   337 │                                                                     │
│   338 │   @staticmethod                                                     │
│   339 │   def spawn_func(pretrained_model_name_or_path, prompts, cache_dir) │
│                                                                             │
│ /home/username/foldername/mediagen/threestudio/threestudio/models/prom │
│ pt_processors/base.py:392 in load_text_embeddings                           │
│                                                                             │
│   389 │   def load_text_embeddings(self):                                   │
│   390 │   │   # synchronize, to ensure the text embeddings have been comput │
│   391 │   │   barrier()                                                     │
│ ❱ 392 │   │   self.text_embeddings = self.load_from_cache(self.prompt)[None │
│   393 │   │   self.uncond_text_embeddings = self.load_from_cache(self.negat │
│   394 │   │   │   None, ...                                                 │
│   395 │   │   ]                                                             │
│                                                                             │
│ /home/username/foldername/mediagen/threestudio/threestudio/models/prom │
│ pt_processors/base.py:410 in load_from_cache                                │
│                                                                             │
│   407 │   │   │   f"{hash_prompt(self.cfg.pretrained_model_name_or_path, pr │
│   408 │   │   )                                                             │
│   409 │   │   if not os.path.exists(cache_path):                            │
│ ❱ 410 │   │   │   raise FileNotFoundError(                                  │
│   411 │   │   │   │   f"Text embedding file {cache_path} for model {self.cf │
│   412 │   │   │   )                                                         │
│   413 │   │   return torch.load(cache_path, map_location=self.device)       │
╰─────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: Text embedding file
.threestudio_cache/text_embeddings/380af1c90b3b8ac914fde9dd32b144db.pt for
model DeepFloyd/IF-I-XL-v1.0 and prompt [a zoomed out DSLR photo of a baby
bunny sitting on top of a stack of pancakes] not found. 
DSaurus commented 1 year ago

Hi,

It appears that the prompt processor fails to generate a text embedding, resulting in the file not found error. I believe this error is caused by libbitsandbytes. You can refer to this issue https://github.com/TimDettmers/bitsandbytes/issues/156 to resolve it.

Ainaemaet commented 1 year ago

Thank you, I just seen another post talking about bitsandbytes issue and have had to solve it before for other programs so I have a good feeling that should work. I will update and close this after I have some sleep and get a chance to check it!

Ainaemaet commented 1 year ago

For anybody who needs it, I fixed the issue by following the advice given on the page @DSaurus linked to copy over libbitsandbytes_cuda.so version to libbitsandbytes_cpu.so Specifically as I'm running python3.10.9 and cuda 11.8, in your env directory: copy your lib\python3.10\site-packages\bitsandbytes\bitsandbytes_cuda118.so (or whatever your version of cuda is) over libbitsandbytes_cpu.so) as well as added export LD_LIBRARY_PATH="/usr/lib/wsl/lib:/usr/local/cuda/lib64" export PATH=/usr/local/cuda-11.8/bin${PATH:+:${PATH}} to .bashrc (being sure to source it after making the changes).

After that, everything seems to be running perfectly! :)

bennyguo commented 1 year ago

Glad to hear this and thanks for sharing the solution! @Ainaemaet

FeiiYin commented 10 months ago

Hi, I made the code run via change the line 61 in deepfloyd_prompt_processor.py to

text_encoder = T5EncoderModel.from_pretrained(
            pretrained_model_name_or_path,
            subfolder="text_encoder",
            torch_dtype=torch.float16,  # suppress warning
            load_in_8bit=True,
            variant="8bit",
            device_map="auto",
        )

I simply let the load_in_8bit=False and it work. Does this operation hurt the performance or simply make it a bit slower?