In the training step, do you shuffle clips?

showlab / Tune-A-Video

[ICCV 2023] Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

Apache License 2.0

4.22k stars 384 forks source link

Hi, thank you first for amazing works.

I have a few questions about your training process. (1) Did you fix the number of frames (clips) as 24? Because in every config file, clip length is consistently 24. Does it impose that any number bigger or smaller than 24 doesn't perform as well as 24?

(2) In the training step, do you shuffle the order of frames (clips)? I have a feeling that it is not proper to shuffle the frames because the frame-related attention parts learn the order of frames too?

Thank you again.

showlab / Tune-A-Video

In the training step, do you shuffle clips? #31