guoqincode / Open-AnimateAnyone

Unofficial Implementation of Animate Anyone
2.9k stars 233 forks source link

About time embedding in ReferenceNet #18

Closed RedRAINXXXX closed 9 months ago

RedRAINXXXX commented 9 months ago

In the official paper, the authors say

While ReferenceNet introduces a comparable number of parameters to the denoising UNet, in diffusion-based video generation, all video frames undergo denoising multiple times, whereas ReferenceNet only needs to extract features once throughout the entire process

But in your implementation of inference, the forward of ReferenceNet is performed multiple times.

Consider fixing the timestep of ReferenceUnet?

guoqincode commented 9 months ago

At each timestep, reference net is executed only once instead of n_frame times.

RedRAINXXXX commented 9 months ago

What I mean is that referenceunt should do reference once in the denoising process, and these features will be reused in all timesteps of denoising unet. But not “n_frame times”?

guoqincode commented 9 months ago

You can email me at guoqin@stu.pku.edu.cn and we'll talk about this in more detail!

RedRAINXXXX commented 9 months ago

I have sent you an email, please check it~