lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
29.96k stars 2.71k forks source link

Training ControlNet, exclude text caption #422

Open vibe007 opened 1 year ago

vibe007 commented 1 year ago

Anyone know how to to finetune ControlNet on image-to-image tasks, without considering the text prompt at all during the training process? I.e. for image restoration tasks like super resolution/denoising

I know at inference time, we can reduce the unconditional_guidance_scale, and at training time we could just pass in an empty string as a workaround, but I figured the optimal solution would involve completely excluding CLIP from the finetuning process

engrmusawarali commented 1 year ago

I have a similar question. Do they drop 50% of the text prompts while training?

geroldmeisinger commented 1 year ago

all duplicates about "dropping prompts" https://github.com/lllyasviel/ControlNet/issues/93 https://github.com/lllyasviel/ControlNet/issues/160 https://github.com/lllyasviel/ControlNet/issues/246 https://github.com/lllyasviel/ControlNet/issues/422 https://github.com/lllyasviel/ControlNet/issues/506

jiangyuhangcn commented 11 months ago

Anyone know how to to finetune ControlNet on image-to-image tasks, without considering the text prompt at all during the training process? I.e. for image restoration tasks like super resolution/denoising

I know at inference time, we can reduce the unconditional_guidance_scale, and at training time we could just pass in an empty string as a workaround, but I figured the optimal solution would involve completely excluding CLIP from the finetuning process

Hi! May I ask why reducing the unconditional_guidance_scale is more suitable for tasks without text prompt? Thanks!