jjihwan / FIFO-Diffusion_public

Official implementation of FIFO-Diffusion
https://jjihwan.github.io
277 stars 17 forks source link

question about trade off between quality and speed #2

Closed Dorniwang closed 1 month ago

Dorniwang commented 1 month ago

From sec.4.2 of the paper, it seems like latent partitioning, which improve quality by reducing the gap between training and inference, increases denoising step, thus we need to use multi gpus to speed up or we just get a slower inference process?

jjihwan commented 1 month ago

Yes, since latent partitioning requires more computation for generating one frame than vanilla diagonal denoising, you might choose either using multiple GPUs or slower inference. However, latent partitioning with n=4 uses 64 inference steps (16×4), which is not slower than original video diffusion models (they often use 50 to 150 steps for inference). In fact, it is much faster than VDMs when using multiple GPUs.

Dorniwang commented 1 month ago

Yes, since latent partitioning requires more computation for generating one frame than vanilla diagonal denoising, you might choose either using multiple GPUs or slower inference. However, latent partitioning with n=4 uses 64 inference steps (16×4), which is not slower than original video diffusion models (they often use 50 to 150 steps for inference). In fact, it is much faster than VDMs when using multiple GPUs.

Got, thanks