Zilong Chen1,2, Yikai Wang1, Feng Wang1, Zhengyi Wang1,2, Huaping Liu1
1Tsinghua University, 2ShengShu
This repository contains the official implementation of V3D: Video Diffusion Models are Effective 3D Generators.
[2024.3.14] Our demo is currently available at (here)[https://huggingface.co/spaces/heheyas/V3D]. I will add more checkpoints and more examples recently.
We are currently working on making this completely publicly available (including refactoring code, uploading weights, etc.), so please be patient.
Single Image to 3D
Generated Multi-views
https://github.com/heheyas/V3D/assets/44675551/bb724ed1-b9a6-4aa7-9a49-f1a8c8756c2f
https://github.com/heheyas/V3D/assets/44675551/4bfaea91-6c5b-40da-8682-30286a916979
Reconstructed 3D Gaussian Splats
https://github.com/heheyas/V3D/assets/44675551/894444eb-a454-4bc9-921b-cd0d5764a14d
https://github.com/heheyas/V3D/assets/44675551/eda05891-e2c7-4f44-af12-9ccd0bce61d1
https://github.com/heheyas/V3D/assets/44675551/27d61245-b416-4289-ba98-97219ad199a3
https://github.com/heheyas/V3D/assets/44675551/e94d71ff-b8bc-410c-ad2c-3cfb1fbef7fa
https://github.com/heheyas/V3D/assets/44675551/a0d1e971-0f8f-4f05-a73e-45271e37a31f
https://github.com/heheyas/V3D/assets/44675551/0dac3189-fc59-4e9b-8151-10ebe2711d71
Sparse view scene generation (On CO3D hydrant
category)
https://github.com/heheyas/V3D/assets/44675551/33c87468-b6c0-4fa2-a9bf-6f396b3fa089
https://github.com/heheyas/V3D/assets/44675551/3c03d015-2e56-44de-8210-e33e7ec810bb
https://github.com/heheyas/V3D/assets/44675551/1e73958b-04b2-4faa-bbc3-675399f21956
https://github.com/heheyas/V3D/assets/44675551/f70cc259-7d50-4bf9-9c1b-0d4143ae8958
https://github.com/heheyas/V3D/assets/44675551/f6407b02-5ee7-4f8f-8559-4a893e6fd912
pip install -r requirements.txt
wget https://huggingface.co/heheyas/V3D/resolve/main/V3D.ckpt -O ckpts/V3D_512.ckpt
wget https://huggingface.co/stabilityai/stable-video-diffusion-img2vid-xt/resolve/main/svd_xt.safetensors -O ckpts/svd_xt.safetensors
PYTHONPATH="." python scripts/pub/V3D_512.py --input_path <image file or dir> --save --border_ratio 0.3 --min_guidance_scale 4.5 --max_guidance_scale 4.5 --output-folder <output-dest>
PYTHONPATH="." python recon/train_from_vid.py -w --sh_degree 0 --iterations 4000 --lambda_dssim 1.0 --lambda_lpips 2.0 --save_iterations 4000 --num_pts 100_000 --video <your generated video>
Or using (NeuS) instant-nsr-pl:
cd mesh_recon
PYTHONPATH="." python launch.py --config configs/videonvs.yaml --gpu <gpu> --train system.loss.lambda_normal=0.1 dataset.scene=<scene_name> dataset.root_dir=<output_dir> dataset.img_wh='[512, 512]'
Refine texture
python refine.py --mesh <your obj mesh file> --scene <your video> --num-opt 16 --lpips 1.0 --iters 500
This code base is built upon the following awesome open-source projects:
Thank the authors for their remarkable job !