threestudio-project / threestudio

A unified framework for 3D content generation.
Apache License 2.0
5.92k stars 457 forks source link

Video tutorial on installing Stable Zero123 with threestudio #415

Open Jordain opened 5 months ago

Jordain commented 5 months ago

https://www.youtube.com/watch?v=jaRr5W80N8E

ArghyaChatterjee commented 5 months ago

Hello @Jordain,

I am using RTX 3060 GPU with 6GB of VRAM. I was following your tutorial but got the following error:

(three_studio_venv) arghya@arghya-Pulse-GL66-12UEK:~/three_studio/threestudio$ python launch.py --config configs/stable-zero123.yaml --train --gpu 0 system.prompt_processor.prompt="a realistic thor hammer with default texture"
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/controlnet_aux/mediapipe_face/mediapipe_face_common.py:7: UserWarning: The module 'mediapipe' is not installed. The package will have limited functionality. Please install it using the command: pip install 'mediapipe'
  warnings.warn(
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_5m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_5m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_11m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_11m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_224 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_224. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_384 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_384. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/controlnet_aux/segment_anything/modeling/tiny_vit_sam.py:654: UserWarning: Overwriting tiny_vit_21m_512 in registry with controlnet_aux.segment_anything.modeling.tiny_vit_sam.tiny_vit_21m_512. This is because the name being registered conflicts with an existing name. Please check if this is not expected.
  return register_model(fn_wrapper)
Seed set to 0
[INFO] GPU available: True (cuda), used: True
[INFO] TPU available: False, using: 0 TPU cores
[INFO] IPU available: False, using: 0 IPUs
[INFO] HPU available: False, using: 0 HPUs
[INFO] You are using a CUDA device ('NVIDIA GeForce RTX 3060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
/home/arghya/three_studio/threestudio/threestudio/data/image.py:93: UserWarning: Using torch.cross without specifying the dim arg is deprecated.
Please either pass the dim explicitly or simply use torch.linalg.cross.
The default value of dim will change to agree with that of linalg.cross in a future release. (Triggered internally at ../aten/src/ATen/native/Cross.cpp:63.)
  right: Float[Tensor, "1 3"] = F.normalize(torch.cross(lookat, up), dim=-1)
[INFO] single image dataset: load image ./load/images/thorhammer_rgba.png torch.Size([1, 128, 128, 3])
[INFO] single image dataset: load image ./load/images/thorhammer_rgba.png torch.Size([1, 128, 128, 3])
[INFO] LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[INFO] 
  | Name       | Type                          | Params
-------------------------------------------------------------
0 | geometry   | ImplicitVolume                | 12.6 M
1 | material   | DiffuseWithPointLightMaterial | 0     
2 | background | SolidColorBackground          | 0     
3 | renderer   | NeRFVolumeRenderer            | 0     
-------------------------------------------------------------
12.6 M    Trainable params
0         Non-trainable params
12.6 M    Total params
50.450    Total estimated model params size (MB)
[INFO] Validation results will be saved to outputs/zero123-sai/[64, 128, 256]_thorhammer_rgba.png@20240207-192339/save
[INFO] Loading Stable Zero123 ...
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.53 M params.
Keeping EMAs of 688.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
[INFO] Loaded Stable Zero123!
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'train_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/data_connector.py:441: The 'val_dataloader' does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` to `num_workers=19` in the `DataLoader` to improve performance.
Epoch 0: |                                                                                                                                                                            | 0/? [00:00<?, ?it/s]Traceback (most recent call last):
  File "launch.py", line 301, in <module>
    main(args, extras)
  File "launch.py", line 244, in main
    trainer.fit(system, datamodule=dm, ckpt_path=cfg.resume)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 544, in fit
    call._call_and_handle_interrupt(
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 580, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 989, in _run
    results = self._run_stage()
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1035, in _run_stage
    self.fit_loop.run()
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 202, in run
    self.advance()
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 359, in advance
    self.epoch_loop.run(self._data_fetcher)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 136, in run
    self.advance(data_fetcher)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/training_epoch_loop.py", line 240, in advance
    batch_output = self.automatic_optimization.run(trainer.optimizers[0], batch_idx, kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 187, in run
    self._optimizer_step(batch_idx, closure)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 265, in _optimizer_step
    call._call_lightning_module_hook(
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 157, in _call_lightning_module_hook
    output = fn(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/core/module.py", line 1291, in optimizer_step
    optimizer.step(closure=optimizer_closure)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/core/optimizer.py", line 151, in step
    step_output = self._strategy.optimizer_step(self._optimizer, closure, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 230, in optimizer_step
    return self.precision_plugin.optimizer_step(optimizer, model=model, closure=closure, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision.py", line 117, in optimizer_step
    return optimizer.step(closure=closure, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/optim/optimizer.py", line 385, in wrapper
    out = func(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/optim/optimizer.py", line 76, in _use_grad
    ret = func(self, *args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/optim/adam.py", line 146, in step
    loss = closure()
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/plugins/precision/precision.py", line 104, in _wrap_closure
    closure_result = closure()
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 140, in __call__
    self._result = self.closure(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 126, in closure
    step_output = self._step_fn()
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/loops/optimization/automatic.py", line 315, in _training_step
    training_step_output = call._call_strategy_hook(trainer, "training_step", *kwargs.values())
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 309, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/pytorch_lightning/strategies/strategy.py", line 382, in training_step
    return self.lightning_module.training_step(*args, **kwargs)
  File "/home/arghya/three_studio/threestudio/threestudio/systems/zero123.py", line 236, in training_step
    out = self.training_substep(batch, batch_idx, guidance="zero123")
  File "/home/arghya/three_studio/threestudio/threestudio/systems/zero123.py", line 76, in training_substep
    out = self(batch)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arghya/three_studio/threestudio/threestudio/systems/zero123.py", line 32, in forward
    render_out = self.renderer(**batch)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arghya/three_studio/threestudio/threestudio/models/renderers/nerf_volume_renderer.py", line 407, in forward
    normal_perturb = self.geometry(
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/arghya/three_studio/threestudio/threestudio/models/geometry/implicit_volume.py", line 182, in forward
    normal = -torch.autograd.grad(
  File "/home/arghya/three_studio/three_studio_venv/lib/python3.8/site-packages/torch/autograd/__init__.py", line 411, in grad
    result = Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 52.00 MiB. GPU 0 has a total capacity of 5.80 GiB of which 10.94 MiB is free. Including non-PyTorch memory, this process has 5.77 GiB memory in use. Of the allocated memory 5.43 GiB is allocated by PyTorch, and 13.19 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Here is my stable-zero123.yaml configuration file:

name: "zero123-sai"
tag: "${data.random_camera.height}_${rmspace:${basename:${data.image_path}},_}"
exp_root_dir: "outputs"
seed: 0

data_type: "single-image-datamodule"
data: # threestudio/data/image.py -> SingleImageDataModuleConfig
  image_path: ./load/images/thorhammer_rgba.png
  height: [128, 256, 512]
  width: [128, 256, 512]
  resolution_milestones: [200, 300]
  default_elevation_deg: 5.0
  default_azimuth_deg: 0.0
  default_camera_distance: 3.8
  default_fovy_deg: 20.0
  requires_depth: ${cmaxgt0orcmaxgt0:${system.loss.lambda_depth},${system.loss.lambda_depth_rel}}
  requires_normal: ${cmaxgt0:${system.loss.lambda_normal}}
  random_camera: # threestudio/data/uncond.py -> RandomCameraDataModuleConfig
    height: [64, 128, 256]
    width: [64, 128, 256]
    # batch_size: [12, 8, 4] ~ 5 hours
    # batch_size: [8, 4, 2] ~ 50 mins
    # batch_size: [4, 2, 1] ~ 5 mins
    # batch_size: [1, 1, 1] ~ 3 mins

    batch_size: [1, 1, 1]
    resolution_milestones: [200, 300]
    eval_height: 512
    eval_width: 512
    eval_batch_size: 1
    elevation_range: [-10, 80]
    azimuth_range: [-180, 180]
    camera_distance_range: [3.8, 3.8]
    fovy_range: [20.0, 20.0] # Zero123 has fixed fovy
    progressive_until: 0
    camera_perturb: 0.0
    center_perturb: 0.0
    up_perturb: 0.0
    light_position_perturb: 1.0
    light_distance_range: [7.5, 10.0]
    eval_elevation_deg: ${data.default_elevation_deg}
    eval_camera_distance: ${data.default_camera_distance}
    eval_fovy_deg: ${data.default_fovy_deg}
    light_sample_strategy: "dreamfusion"
    batch_uniform_azimuth: False
    n_val_views: 30
    n_test_views: 120

system_type: "zero123-system"
system:
  geometry_type: "implicit-volume"
  geometry:
    radius: 2.0
    normal_type: "analytic"

    # use Magic3D density initialization instead
    density_bias: "blob_magic3d"
    density_activation: softplus
    density_blob_scale: 10.
    density_blob_std: 0.5

    # coarse to fine hash grid encoding
    # to ensure smooth analytic normals
    pos_encoding_config:
      otype: HashGrid
      n_levels: 16
      n_features_per_level: 2
      log2_hashmap_size: 19
      base_resolution: 16
      per_level_scale: 1.447269237440378 # max resolution 4096
    mlp_network_config:
      otype: "VanillaMLP"
      activation: "ReLU"
      output_activation: "none"
      n_neurons: 64
      n_hidden_layers: 2

  material_type: "diffuse-with-point-light-material"
  material:
    ambient_only_steps: 100000
    textureless_prob: 0.05
    albedo_activation: sigmoid

  background_type: "solid-color-background" # unused

  renderer_type: "nerf-volume-renderer"
  renderer:
    radius: ${system.geometry.radius}
    num_samples_per_ray: 512
    return_comp_normal: ${cmaxgt0:${system.loss.lambda_normal_smooth}}
    return_normal_perturb: ${cmaxgt0:${system.loss.lambda_3d_normal_smooth}}

  prompt_processor_type: "dummy-prompt-processor" # Zero123 doesn't use prompts
  prompt_processor:
    pretrained_model_name_or_path: ""
    prompt: ""

  guidance_type: "stable-zero123-guidance"
  guidance:
    pretrained_config: "./load/zero123/sd-objaverse-finetune-c_concat-256.yaml"
    pretrained_model_name_or_path: "./load/zero123/stable_zero123.ckpt"
    vram_O: ${not:${gt0:${system.freq.guidance_eval}}}
    cond_image_path: ${data.image_path}
    cond_elevation_deg: ${data.default_elevation_deg}
    cond_azimuth_deg: ${data.default_azimuth_deg}
    cond_camera_distance: ${data.default_camera_distance}
    guidance_scale: 3.0
    min_step_percent: [50, 0.7, 0.3, 200]  # (start_iter, start_val, end_val, end_iter)
    max_step_percent: [50, 0.98, 0.8, 200]

  freq:
    ref_only_steps: 0
    guidance_eval: 0

  loggers:
    wandb:
      enable: false
      project: "threestudio"
      name: None

  loss:
    lambda_sds: 0.1
    lambda_rgb: [100, 500., 1000., 400]
    lambda_mask: 50.
    lambda_depth: 0. # 0.05
    lambda_depth_rel: 0. # [0, 0, 0.05, 100]
    lambda_normal: 0. # [0, 0, 0.05, 100]
    lambda_normal_smooth: [100, 7.0, 5.0, 150, 10.0, 200]
    lambda_3d_normal_smooth: [100, 7.0, 5.0, 150, 10.0, 200]
    lambda_orient: 1.0
    lambda_sparsity: 0.5 # should be tweaked for every model
    lambda_opaque: 0.5

  optimizer:
    name: Adam
    args:
      lr: 0.01
      betas: [0.9, 0.99]
      eps: 1.e-8

trainer:
  max_steps: 600
  log_every_n_steps: 1
  num_sanity_val_steps: 0
  val_check_interval: 100
  enable_progress_bar: true
  precision: 32

checkpoint:
  save_last: true # save at each validation time
  save_top_k: -1
  every_n_train_steps: 100 # ${trainer.max_steps}

No batch size is working. The error that you see is from a batch size of [1, 1, 1]. I have tried the other batch sizes as well but the results are same. What should I do now ??

ArghyaChatterjee commented 5 months ago

Also, I didn't understand one thing. What's the difference between your command showed in the video and the default command in this repo ??

python launch.py --config configs/dreamfusion-sd.yaml --train --gpu 0 system.prompt_processor.prompt="a zoomed out DSLR photo of a baby bunny sitting on top of a stack of pancakes"

I am a little new in this field. I just want to know the difference. Thanks in advance.

BTW, the default command in this repo works for me but for some reason, the above one using your command is not working.

Jordain commented 4 months ago

You might not have enough VRAM to get it to run. Even with 24GB of VRAM it was a lot. Some people were able to get it to run with 12GB of VRAM, but I haven't heard anyone being able to run it with 6GB of VRAM.