showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
https://tuneavideo.github.io
Apache License 2.0
4.15k stars 377 forks source link

Why is the performance of my model becoming worse as I continue training?(为啥我的模型效果越训越差?) #57

Closed zhangtao22 closed 1 year ago

zhangtao22 commented 1 year ago

100 steps的时候还好,越往后越没有效果?难度是图片宽高缩小以后,要调整什么东西?(when steps is 100, the output is ok.But the performance is getting worse as the training progresses!do I have to adjust sth after resizing the Height and width?) 我的配置是 train_data: video_path: "data/man-skiing.mp4" prompt: "a man is skiing" n_sample_frames: 24 width: 256 height: 256 sample_start_idx: 0 sample_frame_rate: 2

validation_data: prompts:

learning_rate: 3e-5 train_batch_size: 1 max_train_steps: 500 checkpointing_steps: 1000 validation_steps: 100 trainable_modules:

seed: 33 mixed_precision: fp16 use_8bit_adam: False gradient_checkpointing: True enable_xformers_memory_efficient_attention: False

zhangjiewu commented 1 year ago

we have also noticed that the performance of our models tends to degrade when lower resolution videos (e.g., 256 x 256) are used. our hypothesis is that this is caused by the pretrained SD models, which were trained on higher resolution images (e.g., 512 x 512). we recommend using higher resolution videos such as 384 x 384 or 512 x 512 for better performance.

我们也注意到,在使用较低分辨率的视频(例如,256 x 256)时,我们的模型性能会下降。我们推测这是由于预训练的SD模型是在更高分辨率的图像(例如,512 x 512)上训练的所致。为了获得更好的性能,我们建议使用更高分辨率的视频(例如384 x 384或512 x 512)。