Closed heptagonnnn closed 1 year ago
it could be due to the different sampler used in the inference stage. you may try specifying the DDIM sampler as follows.
...
from diffusers import DDIMScheduler
pretrained_model_path = "./checkpoints/stable-diffusion-v1-4"
my_model_path = "./outputs/man-surfing"
unet = UNet3DConditionModel.from_pretrained(my_model_path, subfolder='unet', torch_dtype=torch.float16).to('cuda')
scheduler = DDIMScheduler.from_pretrained(pretrained_model_path, subfolder='scheduler')
pipe = TuneAVideoPipeline.from_pretrained(pretrained_model_path, unet=unet, scheduler=scheduler, torch_dtype=torch.float16).to("cuda")
...
prompts Iron Man is surfing in the desert
sample generated by accelerate command![Iron Man is surfing in the desert](https://user-images.githubusercontent.com/26117555/233827161-551b7da2-c2cd-4afc-a43e-5dad7d6c5987.gif)
generated by inference script![Iron Man is surfing in the desert](https://user-images.githubusercontent.com/26117555/233827200-e5c84d78-84f7-41d9-bd3c-9d86108b915f.gif)
i use RTX3060 , torch 1.12, CUDA 11.6, without triton train config like this
pretrained_model_path: "./checkpoints/stable-diffusion-v1-4" output_dir: "./outputs/man-surfing"
train_data: video_path: "data/man-surfing.mp4" prompt: "a man is surfing" n_sample_frames: 8 width: 512 height: 512 sample_start_idx: 0 sample_frame_rate: 1
validation_data: prompts:
learning_rate: 3e-5 train_batch_size: 1 max_train_steps: 500 checkpointing_steps: 1000 validation_steps: 100 trainable_modules:
seed: 33 mixed_precision: fp16 use_8bit_adam: False gradient_checkpointing: True enable_xformers_memory_efficient_attention: True