lllyasviel / ControlNet

Let us control diffusion models!
Apache License 2.0
30.68k stars 2.75k forks source link

How to generate images using your own trained controlnet model? #465

Open Firgui2 opened 1 year ago

Firgui2 commented 1 year ago

Following the official training instructions, I have made the data and and trained it, also based on the test images, I found the best performance at the 22nd epoch, so I got the following from the training interrupted at the 22nd epoch:. A file lighting_logs, I found that the updated ckpt file is stored in the lighting_logs folder, in version1 (because I trained 1 time before, the first time was version0, that time was just a trial). there are three files in version1, the first one is hparams.yaml, which has only one pair of {}. The first is hparams.yaml, which has only one pair of {}. The second one is events.out.tfevents.1689045604.autodl-container-63e511b6ae-8ebbd39d.10590.0, and the last one is the checkpoints folder, which has only one file named epoch=22- step=5335. But I found that in the py files that call the weights for image generation, they are all either directly calling the pth file that controls controlnet: model = create_model('. /models/cldm_v15.yaml').cpu() model.load_state_dict(load_state_dict('. /models/control_sd15_scribble.pth', location='cuda')) Either that or they call the pre-training ckpt weights of the sd model before calling the pth file that controls controlnet. : model_name = 'control_v11p_sd15_scribble' model = create_model(f'. /models/{model_name}.yaml').cpu() model.load_state_dict(load_state_dict('. /models/v1-5-pruned.ckpt', location='cuda'), strict=False) model.load_state_dict(load_state_dict(f'. /models/{model_name}.pth', location='cuda'), strict=False) model = model.cuda() Also the official yaml file they call has values: model: target: cldm.cldm.ControlLDM params: linear_start: 0.00085 linear_end: 0.0120 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: "jpg" cond_stage_key: "txt" control_key: "hint" image_size: 64 channels: 4 cond_stage_trainable: false conditioning_key: crossattn monitor: val/loss_simple_ema scale_factor: 0.18215 use_ema: False only_mid_control: False

control_stage_config:
  target: cldm.cldm.ControlNet
  params:
    image_size: 32 # unused
    in_channels: 4
    hint_channels: 3
    model_channels: 320
    attention_resolutions: [ 4, 2, 1 ]
    num_res_blocks: 2
    channel_mult: [ 1, 2, 4, 4 ]
    num_heads: 8
    use_spatial_transformer: True
    transformer_depth: 1
    context_dim: 768
    use_checkpoint: True
    legacy: False

unet_config:
  target: cldm.cldm.ControlledUnetModel
  params:
    image_size: 32 # unused
    in_channels: 4
    out_channels: 4
    model_channels: 320
    attention_resolutions: [ 4, 2, 1 ]
    num_res_blocks: 2
    channel_mult: [ 1, 2, 4, 4 ]
    num_heads: 8
    use_spatial_transformer: True
    transformer_depth: 1
    context_dim: 768
    use_checkpoint: True
    legacy: False

first_stage_config:
  target: ldm.models.autoencoder.AutoencoderKL
  params:
    embed_dim: 4
    monitor: val/rec_loss
    ddconfig:
      double_z: true
      z_channels: 4
      resolution: 256
      in_channels: 3
      out_ch: 3
      ch: 128
      ch_mult:
      - 1
      - 2
      - 4
      - 4
      num_res_blocks: 2
      attn_resolutions: []
      dropout: 0.0
    lossconfig:
      target: torch.nn.Identity

cond_stage_config:
  target: ldm.modules.encoders.modules.FrozenCLIPEmbedder

May I ask what is going on here? I have considered the effect of the following 1 I didn't set the epoch number, because the official training tutorial didn't teach me how to set it, so I was stopping directly at round 22, which may have caused some loss to the model and thus nothing in the yaml But if I need to set additional epoch number, how should I set it? 2 I set SD lock to true during training, which means that I might modify the weights of the SD model a bit as well, so the ckpt I get should be a direct replacement for the official one? And where should I get it from when PPT? Should I convert ckpt to pth?

ParadiseN1 commented 1 year ago

@Firgui2 figured it out?

alelordelo commented 1 year ago

Same thing here...

Should we convert ckpt to pth?

Firgui2 commented 1 year ago

Same thing here...

Should we convert ckpt to pth?

@Firgui2 figured it out?

I'm still thinking about it.

Firgui2 commented 1 year ago

Same thing here...

Should we convert ckpt to pth?

Maybe.I'm trying to do the same, otherwise there's no way to use ckpt

alelordelo commented 1 year ago

Same thing here... Should we convert ckpt to pth?

Maybe.I'm trying to do the same, otherwise there's no way to use ckpt

but how can we convert?did you find out?

JaosonMa commented 1 year ago
# Configs
ckpt_path = './controlNet/ckpt/last.ckpt'

model = create_model('./models/cldm_v15.yaml').cpu()
model.load_state_dict(load_state_dict(ckpt_path, location='cpu'))

torch.save(model.state_dict(),"./controlNet/ckpt/last.pth")

then you can use the .pth with gradio_xxxxx.py

MuyuenLP commented 1 year ago
# Configs
ckpt_path = './controlNet/ckpt/last.ckpt'

model = create_model('./models/cldm_v15.yaml').cpu()
model.load_state_dict(load_state_dict(ckpt_path, location='cpu'))

torch.save(model.state_dict(),"./controlNet/ckpt/last.pth")

then you can use the .pth with gradio_xxxxx.py

It makes sense ! Thank you !

CQxiaocaimi commented 1 year ago
# Configs
ckpt_path = './controlNet/ckpt/last.ckpt'

model = create_model('./models/cldm_v15.yaml').cpu()
model.load_state_dict(load_state_dict(ckpt_path, location='cpu'))

torch.save(model.state_dict(),"./controlNet/ckpt/last.pth")

然后,您可以将 .pth 与 gradio_xxxxx.py 一起使用

Sorry, I want to ask, are you talking about training train.py and gradio_xxxxx.py together?Or just these two codes? I mean I want to train a certain control network, such as ip2p, but the result is very bad. I have tried using different control networks pth, and I feel that the training results are the same, so I think there may be an error in my training method.

SummerWRain commented 8 months ago

@Firgui2 I wrote a inference script and now it is working relatively normally, hope it can help you. But my code level is limited, if anyone can optimize it again that would be great!

from share import *

from cldm.model import create_model, load_state_dict
import cv2
from annotator.util import resize_image
import numpy as np
import torch
import einops
from cldm.ddim_hacked import DDIMSampler
from PIL import Image

# Configs
resume_path = '/ControlNet/lightning_logs/version_6/checkpoints/last.ckpt' # your checkpoint path
N = 1
ddim_steps = 50

model = create_model('./models/cldm_v21.yaml').cpu()
model.load_state_dict(load_state_dict(resume_path, location='cuda'))
model = model.cuda()
ddim_sampler = DDIMSampler(model)

img_path = 'your image path'
img = cv2.imread(img_path)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = resize_image(img, 512)

control = torch.from_numpy(img.copy()).float().cuda() / 255.0
control = torch.stack([control for _ in range(N)], dim=0)
control = einops.rearrange(control, 'b h w c -> b c h w').clone()
c_cat = control.cuda()
c = model.get_unconditional_conditioning(N)
uc_cross = model.get_unconditional_conditioning(N)
uc_cat = c_cat
uc_full = {"c_concat": [uc_cat], "c_crossattn": [uc_cross]}
cond={"c_concat": [c_cat], "c_crossattn": [c]}
b, c, h, w = cond["c_concat"][0].shape
shape = (4, h // 8, w // 8)

samples, intermediates = ddim_sampler.sample(ddim_steps, N, 
                                             shape, cond, verbose=False, eta=0.0, 
                                             unconditional_guidance_scale=9.0,
                                             unconditional_conditioning=uc_full
                                             )
x_samples = model.decode_first_stage(samples)
x_samples = x_samples.squeeze(0)
x_samples = (x_samples + 1.0) / 2.0
x_samples = x_samples.transpose(0, 1).transpose(1, 2)
x_samples = x_samples.cpu().numpy()
x_samples = (x_samples * 255).astype(np.uint8)

image_name = img_path.split('/')[-1]
Image.fromarray(x_samples).save('./outputs/' + image_name)