wfanyue / DPG-T2I-Personalization

[ECCV 2024] Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning
32 stars 1 forks source link

Is the Q_phi an identity mapping? #4

Open CharlesGong12 opened 2 weeks ago

CharlesGong12 commented 2 weeks ago

Hi, thanks for your work!

Your Q_phi's input is torch.square(x_0_latents - x_0_gt) and output is designed to fit F.mse_loss(x_0_gt, x_0_latents, reduction="mean"), which is the mean of Q_phi's input. So the goal of Q_phi is just an identity mapping and a mean operation? If so, how could it work?

wfanyue commented 1 week ago

Hi, thanks for your question!

Our work provides a framework to incorporate various objectives for supervising personalized T2I generation models.

Regarding your questions:

  1. The goal of ‘Q_phi,’ which denotes the reward model, is to predict the reward and act as a differentiable function to optimize the framework.

  2. About the design: 1). We first reformulate the current T2I personalization model (e.g., DreamBooth). In this context, ‘Q_phi’ serves as an identity mapping. 2). The code currently released is the implementation of ‘Look Forward.’ It can be directly used as the loss function to train the model, and it works well. Here, we integrated it into our DPG framework, where it functions also effectively. 3). Our framework can handle complex supervisory signals, such as ‘DINO’ similarity (as mentioned in our paper) and human face similarity. Since I’ve just graduated and am busy with work, the code may be released during October 1st and October 7th. Apologies for the delay.

If you have any other questions, feel free to contact me.