ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation

Zhiyuan Ma Yuxiang Wei Yabin Zhang Xiangyu Zhu Zhen Lei Lei Zhang

[[Paper]](https://arxiv.org/pdf/2407.02040) | [[Project Page]](https://sites.google.com/view/scaledreamer-release/) ---

🔥 News

2024.07.02 ScaleDreamer is accepted by ECCV 2024
2024.06.23 Create this repo.

⚙️ Dependencies and Installation

Follow threestudio to set up the conda environment, or use our provided instructions as below.

- Create a virtual environment: ```sh conda create -n scaledreamer python=3.10 conda activate scaledreamer ``` - Install PyTorch ```sh # Prefer using the latest version of CUDA and PyTorch conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia ``` - (Optional, Recommended) Install [xFormers](https://github.com/facebookresearch/xformers) for attention acceleration. ```sh conda install xformers -c xformers ``` - (Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions: ```sh pip install ninja ``` - Install major dependencies: ```sh pip install -r requirements.txt ``` - Install [iNGP](https://github.com/NVlabs/instant-ngp) and [NerfAcc](https://github.com/nerfstudio-project/nerfacc): ```sh export PATH="/usr/local/cuda/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH" pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch pip install git+https://github.com/KAIR-BAIR/nerfacc.git@v0.5.2 ``` If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these instructions to change the gcc version within your conda environment. Then return to the repository directory to install iNGP and NerfAcc ⬆️ again. ```sh conda install -c conda-forge gxx=9.5.0 cd $CONDA_PREFIX/lib ln -s /usr/lib/x86_64-linux-gnu/libcuda.so ./ cd ```

Download 2D Diffusion Priors.

- Save [SD-v2.1-base](https://huggingface.co/stabilityai/stable-diffusion-2-1-base) and [MVDream](https://mv-dream.github.io/) to the local directory `pretrained`. ``` python scripts/download_pretrained_models.py ```

🌈 Prompt-Specific 3D Generation

ASD with SD (Stable Diffusion). Feel free to change the prompt accordingly.
```
sh scripts/single-prompt-benchmark/asd_sd_nerf.sh
```
ASD with MV (MVDream). Feel free to change the prompt accordingly.
```
sh scripts/single-prompt-benchmark/asd_mv_nerf.sh
```

🚀 Prompt-Amortized 3D Generator Tranining

The following 3D generator architectures are available:

Network	Description	File
Hyper-iNGP	iNGP with text-conditioned linear layers,adopted from ATT3D	geometry, background
3DConv-net	A StyleGAN generator that outputs voxels with 3D convolution, code adopted from CC3D	geometry, architecture
Triplane-Transformer	Transformer-based 3D Generator, with Triplane as the output structure, adopted from LRM	geometry, architecture

The following corpus datasets are available:

Corpus	Description	File
MG15	15 text pormpts from Magic3D project page	json
DF415	415 text pormpts from DreamFusion project page	json
AT2520	2520 text pormpts from ATT3D experiments	json
DL17k	17k text pormpts from Instant3D release	json
CP100k	110k text pormpts from Cap3D dataset	json

Run the following script to start training

Hyper-iNGP with SD on MG15

sh scripts/multi-prompt-benchmark/asd_sd_hyper_iNGP_MG15.sh

3DConv-net with SD on DF415

sh scripts/multi-prompt-benchmark/asd_sd_3dconv_net_DF415.sh

3DConv-net with SD on AT2520

sh scripts/multi-prompt-benchmark/asd_sd_3dconv_net_AT2520.sh

Triplane-Transformer with MV on DL17k

sh scripts/multi-prompt-benchmark/asd_mv_triplane_transformer_DL17k.sh

3DConv-net with SD on CP100k

scripts/multi-prompt-benchmark/asd_sd_3dconv_net_CP100k.sh

📷 Prompt-Amortized 3D Generator Evaluation

Create a directory to save the checkpoints

mkdir pretrained/3d_checkpoints

The checkpoints of the ⬆️ experiments are available. Save the corresponding .pth file to 3d_checkpoint, then run the scripts as below.

Hyper-iNGP with SD on MG15. The ckpt in Google Drive

sh scripts/multi_prompts_benchmark_evaluation/asd_sd_hyper_iNGP_MG15.sh

3DConv-net with SD on DF415. The ckpt in Google Drive

sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_DF415.sh

3DConv-net with SD on AT2520. The ckpt in Google Drive

sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_AT2520.sh

Triplane-Transformer with MV on DL17k. The ckpt in Google Drive

sh scripts/multi_prompts_benchmark_evaluation/asd_mv_triplane_transformer_DL17k.sh

3DConv-net with SD on CP100k. The ckpt in Google Drive

sh scripts/multi_prompts_benchmark_evaluation/asd_sd_3dconv_net_CP100k.sh

The rendered images and videos are saved in outputs/<experiment_name>/save/<num_iter> directory. Compute the metrics with CLIP via

python evaluation/CLIP/evaluation_amortized.py --result_dir <video_dir>

🕹️ Create Your Own Modules

3D Generator

Place the code in custom/amortized/models/geometry, check out the other code in that directory for reference.
Update your in custom/amortized/models/geometry/__init__.py
Create your own config file, type in your registered module name in the system.geometry_type argument, check out the other code in the configs/multi-prompt_benchmark directory for reference.

2D Diffusion Guidance

Put your code in threestudio/models/guidance, take a look at the other code in that directory or other guidance for reference.
Update your in threestudio/models/guidance/__init__.py
Create your own config file, type in your registered module name in the system.guidance_type argument, take a look at the other code in the configs/multi-prompt_benchmark directory for reference.

Text corpus

Create a JSON file that lists the training, validation, and test text prompts in the load directory
Enter the name of this JSON file into the system.prompt_processor.prompt_library argument to set up the corpus, take other commands in the scripts directory for reference

You can also add your modules for data, renderer, prompt_processor, etc.

📖 Citation

If you find this paper helpful, please cite

@article{ma2024scaledreamer,
  title={ScaleDreamer: Scalable Text-to-3D Synthesis with Asynchronous Score Distillation},
  author={Ma, Zhiyuan and Wei, Yuxiang and Zhang, Yabin and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
  journal={arXiv preprint arXiv:2407.02040},
  year={2024}
}

🙏 Acknowledgement

threestudio, a clean and extensible codebase for text-to-3D.
MVDream-threestudio, the implementation of MVDream for text-to-3D.
OpenLRM, the implementation of LRM. We develop the 3D generator of Triplane-Transformer on top of it.
Cap3D, which provides the text caption of Objaverse. We develop the corpus of CP100k on top of it.

theEricMa / ScaleDreamer

readme