jjihwan / SV3D-fine-tune

Fine-tuning code for SV3D
MIT License
51 stars 3 forks source link

SV3D fine-tuning

Fine-tuning code for SV3D

Input Image Before Training After Training
input image Image 2 Image 3

Setting up

PyTorch 2.0

conda create -n sv3d python==3.10.14
conda activate sv3d
pip3 install -r requirements.txt

Install deepspeed for training

pip3 install deepspeed

Get checkpoints πŸ’Ύ

Store them as following structure:

cd SV3D-fine-tuning
    .
    └── checkpoints
        └── sv3d_p.safetensors

Dataset πŸ“€

Prepare dataset as following. We use Objaverse 1.0 dataset with preprocessing pipeline. See objaverse dataloader for detail. orbit_frame_0020.png is input image, and video_latent.pt is the video latent encoded by SV3D encoder, without regularization (i.e. channel is 8)

cd dataset
    .
    └── 000-000
    |   └── orbit_frame_0020.png # input image
    |   └── video_latent.pt # video latent
    └── 000-001
    |   └── orbit_frame_0020.png
    |   └── video_latent.pt
    └── ...

Training πŸš€

I used a single A6000 GPU(VRAM 48GB) to fine-tune.

sh scripts/sv3d_finetune.sh

Inference ❄️

Store the input images in assets

sh scripts/inference.sh

Notes

Acknowledgement πŸ€—

The source code is based on SV3D. Thanks for the wonderful codebase!

Additionally, GPU and NFS resources for training are supported by fal.aiπŸ”₯.

Feel free to refer to the fal Research Grants!