Fine-tuning code for SV3D
Input Image | Before Training | After Training |
---|---|---|
conda create -n sv3d python==3.10.14
conda activate sv3d
pip3 install -r requirements.txt
deepspeed
for trainingpip3 install deepspeed
Store them as following structure:
cd SV3D-fine-tuning
.
βββ checkpoints
βββ sv3d_p.safetensors
Prepare dataset as following.
We use Objaverse 1.0 dataset with preprocessing pipeline.
See objaverse dataloader for detail.
orbit_frame_0020.png
is input image, and video_latent.pt
is the video latent encoded by SV3D encoder, without regularization (i.e. channel is 8)
cd dataset
.
βββ 000-000
| βββ orbit_frame_0020.png # input image
| βββ orbit_frame.pt # video latent
βββ 000-001
| βββ orbit_frame_0020.png
| βββ orbit_frame.pt
βββ ...
I used a single A6000 GPU(VRAM 48GB) to fine-tune.
sh scripts/sv3d_finetune.sh
Store the input images in assets
sh scripts/inference.sh
The source code is based on SV3D. Thanks for the wonderful codebase!
Additionally, GPU and NFS resources for training are supported by fal.aiπ₯.
Feel free to refer to the fal Research Grants!