showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 377 forks source link

why the sample generated by accelerate command looks much better than the gif generated by the inference script #60

Closed heptagonnnn closed 1 year ago

heptagonnnn commented 1 year ago

prompts Iron Man is surfing in the desert

sample generated by accelerate command Iron Man is surfing in the desert

generated by inference script Iron Man is surfing in the desert

i use RTX3060 , torch 1.12, CUDA 11.6, without triton train config like this

pretrained_model_path: "./checkpoints/stable-diffusion-v1-4" output_dir: "./outputs/man-surfing"

train_data: video_path: "data/man-surfing.mp4" prompt: "a man is surfing" n_sample_frames: 8 width: 512 height: 512 sample_start_idx: 0 sample_frame_rate: 1

validation_data: prompts:

learning_rate: 3e-5 train_batch_size: 1 max_train_steps: 500 checkpointing_steps: 1000 validation_steps: 100 trainable_modules:

seed: 33 mixed_precision: fp16 use_8bit_adam: False gradient_checkpointing: True enable_xformers_memory_efficient_attention: True

zhangjiewu commented 1 year ago

it could be due to the different sampler used in the inference stage. you may try specifying the DDIM sampler as follows.

...
from diffusers import DDIMScheduler

pretrained_model_path = "./checkpoints/stable-diffusion-v1-4"
my_model_path = "./outputs/man-surfing"
unet = UNet3DConditionModel.from_pretrained(my_model_path, subfolder='unet', torch_dtype=torch.float16).to('cuda')
scheduler = DDIMScheduler.from_pretrained(pretrained_model_path, subfolder='scheduler')
pipe = TuneAVideoPipeline.from_pretrained(pretrained_model_path, unet=unet, scheduler=scheduler, torch_dtype=torch.float16).to("cuda")

...