Open rob-hen opened 2 weeks ago
Hi, we do not use the sequence parallel during training. The VIDEO_SYNC_GROUP
controls the number of processes that accept the same video batch as input. We find such a trick will make the gradient direction more stable (optimize the performance of the whole latent sequence of a video, not just a latent from different videos).
Hi @jy0205,
thank you for the answer.
So with VIDEO_SYNC_GROUP =8
and GPUS=8
, all GPUs get exactly the same videos. However, I don't see the difference between the processes, all will use exactly the same latent (the same clip from the videos): https://github.com/jy0205/Pyramid-Flow/blob/e4b02ef31edba13e509896388b1fedd502ea767c/dataset/dataset_cls.py#L192 .
I think video_sync_group doesn't split same video latent, but accept same video latent without splitting.
I think video_sync_group doesn't split same video latent, but accept same video latent without splitting.
- This part is different to sequece parallel, which split latent according to time axis.
- Is that right ??
Yes, you are right. The video_sync_group
does not split the video. It works since different video ranks load different video lengths. You can find in the sample_length
method.
- Why only the number of high resolution units are uniformly sampled??
- https://github.com/jy0205/Pyramid-Flow/blob/e4b02ef31edba13e509896388b1fedd502ea767c/pyramid_dit/pyramid_dit_for_video_gen_pipeline.py#L360
All the stages employ the uniform sampling. We will make the video token sequence length-balanced (let the token length sum to be fixed)
Hi all,
the provided script
train_pyramid_flow.sh
does not set the flaguse_sequence_parallel
. In that case, what is the purpose of usingVIDEO_SYNC_GROUP=8
? Why we want all workers to use the same video?