Open vibe007 opened 1 year ago
I have a similar question. Do they drop 50% of the text prompts while training?
Anyone know how to to finetune ControlNet on image-to-image tasks, without considering the text prompt at all during the training process? I.e. for image restoration tasks like super resolution/denoising
I know at inference time, we can reduce the unconditional_guidance_scale, and at training time we could just pass in an empty string as a workaround, but I figured the optimal solution would involve completely excluding CLIP from the finetuning process
Hi! May I ask why reducing the unconditional_guidance_scale is more suitable for tasks without text prompt? Thanks!
Anyone know how to to finetune ControlNet on image-to-image tasks, without considering the text prompt at all during the training process? I.e. for image restoration tasks like super resolution/denoising
I know at inference time, we can reduce the unconditional_guidance_scale, and at training time we could just pass in an empty string as a workaround, but I figured the optimal solution would involve completely excluding CLIP from the finetuning process