Closed RedRAINXXXX closed 9 months ago
At each timestep, reference net is executed only once instead of n_frame times.
What I mean is that referenceunt should do reference once in the denoising process, and these features will be reused in all timesteps of denoising unet. But not “n_frame times”?
You can email me at guoqin@stu.pku.edu.cn and we'll talk about this in more detail!
I have sent you an email, please check it~
In the official paper, the authors say
But in your implementation of inference, the forward of ReferenceNet is performed multiple times.
Consider fixing the timestep of ReferenceUnet?