yk7333 / d3po

[CVPR 2024] Code for the paper "Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model"
https://arxiv.org/abs/2311.13231
MIT License
168 stars 14 forks source link

Can it be used to train controlnet? How? #1

Closed universewill closed 11 months ago

universewill commented 11 months ago

Can it be used to train controlnet ?

yk7333 commented 11 months ago

Based on my understanding, the training process for ControlNet involves generating conditioning images from certain ground truth images through methods such as using Canny edge detection to obtain the image outlines, along with an image prompt. These conditioning images and prompts are then used as inputs to the diffusion model to generate images aligned with the ground truth images. The entire process likely does not require human feedback or involvement, which might render our approach unsuitable.

universewill commented 11 months ago

Based on my understanding, the training process for ControlNet involves generating conditioning images from certain ground truth images through methods such as using Canny edge detection to obtain the image outlines, along with an image prompt. These conditioning images and prompts are then used as inputs to the diffusion model to generate images aligned with the ground truth images. The entire process likely does not require human feedback or involvement, which might render our approach unsuitable.

Sorry,i didn't describe clearly.What i mean is, i want to use dpo to finetune my trained controlnet to learn from human feedback data to get better result. Is that possible with d3po?

yk7333 commented 11 months ago

I understand, that's possible. You can import your trained ControlNet model as a pre-trained model, modify the code to take prompts and conditioning images as input, and then run scripts/sample.py to generate the corresponding images. After receiving human feedback results saved as a JSON file, you can fine-tune your ControlNet model by running scripts/train.py, enabling your model to generate images that better align with human preferences. Good luck!