Closed yjhong89 closed 19 hours ago
Here are the answers to your questions:
latents_list[i_s][index::column_size]
aims to get a batch of samples that belong to the same stage.video_sync_group
is for controlling the group of processes that accept the same input sample.frames
key means you can specify the frame indexes you want to extract.Thanks!
Another questions
Theoretically, it naturally performs I2V training during autoregressive training (since the first frame is an image). However, we have not explicitly optimized for I2V, so the performance may be suboptimal. We are working on some improvements and will share them in due time.
Yes, sounds right. autoregressive training naturally doing I2V training.
Another question ?
Great observation! Please refer to https://github.com/jy0205/Pyramid-Flow/issues/28#issuecomment-2406892327.
Thanks for quick answer!
Hi! Thanks for sharing training code!
While I analyzing implementation in details and have few questions.
https://github.com/jy0205/Pyramid-Flow/blob/e4b02ef31edba13e509896388b1fedd502ea767c/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py#L451
[index::column_size]
? Sincelatent_list[i_s]
would have shape of[bs, c, t, h, w]
solatents_list[i_s][index::column_size]
means just getting one batch, isn't it?How video sync group works?
When extract video latent in advance, all videos have same fps ?? Since this line means if "frame" is not specified in annotation, extract first 121 frames
Why multiplying 2 in here? to preserve variance for each stage ?
Thanks!