What about adding time embedding to the pose guider?

guoqincode / Open-AnimateAnyone

Unofficial Implementation of Animate Anyone

2.9k stars 233 forks source link

What about adding time embedding to the pose guider? #1

Closed jaymefosa closed 9 months ago

jaymefosa commented 9 months ago

As the title mentions, has this crossed your mind?

guoqincode commented 9 months ago

In the Animate Anyone paper, the authors did not add time embedding to poseguider. At the same time, I think it is not necessary to add time embedding. pose essentially provides a kind of layout information, which can make the model pay more attention to the information of the pose area during training. You can refer to this paper inspiration: https://arxiv.org/abs/2305.03382

jaymefosa commented 9 months ago

The concern would be that forcing the denoising Unet to receive the same strength of pose latents regardless of timestep (at inference or training) would put all the burden on the Unet to modulate the strength of the pose based on the timestep. (Early timesteps ~900 should have stronger layout information than step ~100 would need)

guoqincode commented 9 months ago

I understand what you mean. I will try your suggestion after training according to the way in the paper. Thank you.

jaymefosa commented 9 months ago

I've seen a lot of diffusion papers but not "Guided Image Synthesis via Initial Image Editing in Diffusion Model." Thanks very much for sharing it