yk7333 / d3po

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
https://arxiv.org/abs/2311.13231
MIT License
158 stars 14 forks source link

Can we train D3PO without text prompts? #16

Open giovanlee opened 1 week ago

giovanlee commented 1 week ago

Hi, I have an interest in your resarch, and trying to finetuning my diffusion model with human feedback data. I have image diffusion model which samples images without text prompts. I also have human feedback(human preference) with sampled images. But I don't have text prompts about my images. Is it possible to train my diffusion model without text prompts, only with human preference data?

Thank you.

yk7333 commented 1 week ago

D3PO needs to ensure that the initial Gaussian noise and text prompt for denoising two or more images are consistent. Then, human preference selection can be performed based on the generated images. If the text prompt is missing, the generated images might differ significantly. For example, one image might depict a person while the other shows a landscape. I am not sure if such an outcome would be desirable because the space for generating images without a prompt is too broad. I suggest you try this on some simpler tasks first, such as using image compressibility or other metrics to observe if the reward value increases. Once confirmed effective, you can then transition to larger tasks.

giovanlee commented 1 week ago

Thanks for quick reply! Actually, my diffusion model is trained with very specific domain, especially in medical domain. So I think without text prompt, space of generated image will not be broad. I was wondering it is possible to train without text prompt, and with your reply, I can try to train without it.

Thank you :)