lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
29.65k stars 2.68k forks source link

Problem in Training #554

Open cashtsangwh opened 11 months ago

cashtsangwh commented 11 months ago

I follow the guidelines here to train the ControlNet model. https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md I want to train this model for image inpainting, the source image is an image being masked(masked region is in white color), the target image is the original image. However, during the training steps, in the image log, I found that the model also change the content of the unmasked region. How to solve this problem?

geroldmeisinger commented 11 months ago

you need to provide more details about your training. why you care about the unmasked region anyway when you don't need it for composition?

cashtsangwh commented 11 months ago

you need to provide more details about your training. why you care about the unmasked region anyway when you don't need it for composition?

As an image inpainting task, I want to keep the unmasked region unchanged. I just follow the guidelines here: https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md and use pytorch lightning to train the model. In this guidelines, we need to create our custom dataset, hence the dataset consist of the following: source: An RGB masked image, the white region is the masked region source target: The origin image target The I just follow the setting in the guidelines to do the training:

resume_path = './models/control_sd15_ini.ckpt' batch_size = 4 logger_freq = 300 learning_rate = 1e-5 sd_locked = True only_mid_control = False

model = create_model('./models/cldm_v15.yaml').cpu() model.load_state_dict(load_state_dict(resume_path, location='cpu')) model.learning_rate = learning_rate model.sd_locked = sd_locked model.only_mid_control = only_mid_control

dataset = MyDataset() dataloader = DataLoader(dataset, num_workers=0, batch_size=batch_size, shuffle=True) logger = ImageLogger(batch_frequency=logger_freq) trainer = pl.Trainer(gpus=1, precision=32, callbacks=[logger])

trainer.fit(model, dataloader)

geroldmeisinger commented 11 months ago

Please also post the generated images. How many steps or samples did you train? Did you reach convergence? It looks like a rendered image. Is this a domain-specific model or what are you using it for?

As an image inpainting task, I want to keep the unmasked region unchanged.

Yeah... but does it really work that way? I'd expect SD to change the whole image, because that's how SD works. But if you already have a mask, just copy-paste the regions you need into your original image and you are done. ... given the image makes sense in your original context but for this I need to know more details.

batch_size = 4

I'd recommend to increase it (if you have enough VRAM) or gradient accumulation steps. you get much better quality.

precision=32

I'd recommend to decrease to 16 while you are still improving your workflow (2x speed-up, slightly lower quality)

cashtsangwh commented 11 months ago

Please also post the generated images. How many steps or samples did you train? Did you reach convergence? It looks like a rendered image. Is this a domain-specific model or what are you using it for?

As an image inpainting task, I want to keep the unmasked region unchanged.

Yeah... but does it really work that way? I'd expect SD to change the whole image, because that's how SD works. But if you already have a mask, just copy-paste the regions you need into your original image and you are done. ... given the image makes sense in your original context but for this I need to know more details.

batch_size = 4

I'd recommend to increase it (if you have enough VRAM) or gradient accumulation steps. you get much better quality.

precision=32

I'd recommend to decrease to 16 while you are still improving your workflow (2x speed-up, slightly lower quality)

I am just providing an example of an input, the output generated image of this example cannot be provided. But you may expect the texture of the wall will change a lot. Although I can copy-paste those unmasked regions, but it will be extremely odd. For example if the generated image has a red wall, but the original image has a white wall, then they cannot be matched .

I also want to increase the batch size but unfortunately I don't have enough VRAM. Accumulated gradient maybe a possible solution but I think this cannot solve my problem.

I think controlnet can be used in image inpainting task. In fact in controlnet v1.1, it indeeds has a pretrained image inpainting model, but I need to train my model for some specific use.

There are about 50k source-target pair, and I have train them for about 30 epochs.

geroldmeisinger commented 11 months ago

the output generated image of this example cannot be provided

why?

Tanghui2000 commented 9 months ago

May I ask whether this problem has been solved? I encountered a similar situation in the experiment, and the content of the generated image is different from that of the original image, which bothers me very much.

engrmusawarali commented 8 months ago

You can you use stable diffusion inpaint for inference