yangxiaofeng / rectified_flow_prior

Official code for paper: Text-to-Image Rectified Flow as Plug-and-Play Priors
66 stars 1 forks source link

Could this pipeline support using 3D GS as the backbone? #2

Open yiboz2001 opened 1 month ago

yiboz2001 commented 1 month ago

Hi! Due to the wonderful result generated with SD3, I wonder if this pipeline supports using 3D GS as the backbone to reduce the training process time while maintaining the performance.

yangxiaofeng commented 1 month ago

Hi, Thank you for your interest in our work! Yes. For the implementation of the 3dgs backbone, you can check out our previous project here https://github.com/yangxiaofeng/LODS. To add 3dgs support to this repo, just copy the "gaussiansplatting" folder and create a new system under "threestudio/systems"

Here's an example result generated with the 3dgs backbone: Prompt: A bear holding a sign that says RFDS. A_bear_holding_a_sign_that_says_RFDS

We're planning to officially support other 3D backbones in future versions, so stay tuned!

yiboz2001 commented 1 month ago

Thanks for reply! I can't wait for trying it.

yiboz2001 commented 4 weeks ago

Hi! I have build the corresponding system and config file but the result failed. Could you provide the config you used? And here is mine:

name: "rfds-rev-sd3-3d-gs"
tag: "${rmspace:${system.prompt_processor.prompt},_}"
exp_root_dir: "outputs"
seed: 0

data_type: "random-camera-datamodule-gs"
data:
  batch_size: 8
  eval_camera_distance: 4.0
  camera_distance_range: [2.5, 4.0]
  light_sample_strategy: "dreamfusion3dgs"
  height: 512
  width: 512
  eval_height: 512
  eval_width: 512
  fovy_range: [40, 70]
  elevation_range: [-10, 45]

system_type: "rf-gs-system"
system:
  radius: ${data.eval_camera_distance}

  prompt_processor_type: "sd3-prompt-processor"
  prompt_processor:
    pretrained_model_name_or_path: ""
    prompt: ???
    negative_prompt: "unrealistic, blurry, low quality, oversaturation."
    front_threshold: 30.
    back_threshold: 30.

  guidance_type: "RFDS-Rev-sd3"
  guidance:
    pretrained_model_name_or_path: ""
    guidance_scale: 100
    min_step_percent: [1000, 0.3, 0.3, 1001]
    max_step_percent: [1000, 0.98, 0.5, 1001]

  loggers:
    wandb:
      enable: false
      project: 'threestudio'
      name: None

  loss:
    lambda_rfds: 1
    lambda_orient: 0.
    lambda_sparsity: 0.2
    lambda_opaque: 0.1
    lambda_z_variance: 0.

  optimizer:
    name: AdamW
    args:
      lr: 0.001
      betas: [0.9, 0.99]
      eps: 1.e-15

trainer:
  max_steps: 2500
  log_every_n_steps: 1
  num_sanity_val_steps: 0
  val_check_interval: 100
  enable_progress_bar: true
  precision: 16-mixed

checkpoint:
  save_last: true # save at each validation time
  save_top_k: -1
  every_n_train_steps: ${trainer.max_steps}

# white_background: true
yangxiaofeng commented 4 weeks ago

Hi, what's the prompt you use? You may consider to use less complicated prompts for shap-e initialization. For example, if your prompt is "A bear holding a sign that says Welcome." The prompt for shap-e should be "a bear".