Open giovanlee opened 1 week ago
D3PO needs to ensure that the initial Gaussian noise and text prompt for denoising two or more images are consistent. Then, human preference selection can be performed based on the generated images. If the text prompt is missing, the generated images might differ significantly. For example, one image might depict a person while the other shows a landscape. I am not sure if such an outcome would be desirable because the space for generating images without a prompt is too broad. I suggest you try this on some simpler tasks first, such as using image compressibility or other metrics to observe if the reward value increases. Once confirmed effective, you can then transition to larger tasks.
Thanks for quick reply! Actually, my diffusion model is trained with very specific domain, especially in medical domain. So I think without text prompt, space of generated image will not be broad. I was wondering it is possible to train without text prompt, and with your reply, I can try to train without it.
Thank you :)
Hi, I have an interest in your resarch, and trying to finetuning my diffusion model with human feedback data. I have image diffusion model which samples images without text prompts. I also have human feedback(human preference) with sampled images. But I don't have text prompts about my images. Is it possible to train my diffusion model without text prompts, only with human preference data?
Thank you.