lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
29.82k stars 2.69k forks source link

How to do promptless training for ControlNet? is there any script for that? #602

Open KhawlahB opened 9 months ago

KhawlahB commented 9 months ago

I need to do promptless training, how can I do it?

Is there any direct script for training promptless ControlNet ?

geroldmeisinger commented 9 months ago

just set the captions to empty string

all duplicates about "dropping prompts" https://github.com/lllyasviel/ControlNet/issues/93 https://github.com/lllyasviel/ControlNet/issues/160 https://github.com/lllyasviel/ControlNet/issues/246 https://github.com/lllyasviel/ControlNet/issues/422 https://github.com/lllyasviel/ControlNet/issues/506

KhawlahB commented 9 months ago

just set the captions to empty string

all duplicates about "dropping prompts" #93 #160 #246 #422 #506

I already had this idea in my mind (using empty string) but it is not look good, that's why i am asking if there is a direct script for promptless image2image generation using ControlNet... so, i can do promptless training directly

@geroldmeisinger

geroldmeisinger commented 9 months ago

you don't want to drop all prompts but just a certain percentage otherwise the CN becomes meaningless. something like rand() > 0.5 ? "" : caption . but to give a better answer we would need more details. what is the concept of your CN? how does it perform without prompt dropping?

KhawlahB commented 9 months ago

you don't want to drop all prompts but just a certain percentage otherwise the CN becomes meaningless. something like rand() > 0.5 ? "" : caption . but to give a better answer we would need more details. what is the concept of your CN? how does it perform without prompt dropping?

The concept for my ControlNet is image-to-image translation, i want to feed the model an image from domain A and i want it to generate its corresponding image in domain B...

so, i do not need the prompt. I have tried to use empty string and meaningless words but i got generated image with random textures! it mixed things...

Can you please clarify how to drop the prompts? @geroldmeisinger

geroldmeisinger commented 9 months ago

can post some images please. the problem is unlikely due to the prompts. prompts dropping can just subtly increase the quality but not solve a broken concept.

KhawlahB commented 9 months ago

it is similar to this...

Screenshot 2023-12-13 011044

it is image-to-image translation. So, no need for prompt. The input and output are images and the condition will be an image as well to control the generation.

Does the prompt dropping gonna help me in my idea? @geroldmeisinger

geroldmeisinger commented 9 months ago
  1. if you want to train multiple concepts in your CN you need a lot more data (your image shows segments, greyscale2color, lines to image all in one etc.)
  2. imagine a line drawing of circle. what is it supposed to generate from that without a prompt? a ball, an orange, a planet, a tire etc. unless you stay in the same domain (like images of street scenes) you have to guide it somehow. but for this you need to provide more details.
KhawlahB commented 9 months ago

1) it is one concept only... i showed you different concepts just to clarify my point ( i do not need prompt)... to be more specific... my idea is similar to this one aerial to map. The input will be the aerial image and its corresponding map -> in the training.. In inference the input will be the aerial image the generated image should be a map for this aerial image...

Screenshot 2023-12-13 011044

So, the guidance will be conditional image

2) what will happen if i dropped 100% of captions?

I hope it is clear now..

@geroldmeisinger

geroldmeisinger commented 9 months ago

what will happen if i dropped 100% of captions?

I don't know, I never tried myself. Although it feels strange to not use any prompt at all on a diffusion model which requires prompts. You could also try to use the same prompt for all "an aerial photograph". Maybe it's better to create your own promptlesss diffusion model just for aerial photographs. Another thing you could look into is sd_unlocked=true.

KhawlahB commented 9 months ago

I will give it a try. But what does (sd_unlocked=true) mean? how does it gonna help me in promptless training? clarify it please.

@geroldmeisinger

geroldmeisinger commented 9 months ago

as far as I understand it trains "deeper" into Stable Diffusion which could be helpful here if you're working in the same domain

remember00000 commented 5 months ago

@KhawlahB I am trying similar things, how about your results? Relly looking forward to your response:)

CuddleSabe commented 4 months ago

@KhawlahB I am trying similar things, how about your results? Relly looking forward to your response:) while compute attention in the unet, the prompt embedding will be the key and value, so u can imagine what happened if key and value are always the same..