tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.11k stars 333 forks source link

About train code #17

Open xyxxmb opened 1 year ago

xyxxmb commented 1 year ago

I run tutorial_train.py and save the related param of ‘unet’ and ‘ip-adapter_sd15.bin’. But when I load the unet param with StableDiffusionPipeline, I get the warning:

weights of the model checkpoint were not used when initializing UNet2DConditionModel:
 ['down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight, up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight, mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight, mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight']

I think maybe the reason about the training code with the lines:

else:
   layer_name = name.split(".processor")[0]
   weights = {
                "to_k_ip.weight": unet_sd[layer_name + ".to_k.weight"],
                "to_v_ip.weight": unet_sd[layer_name + ".to_v.weight"],
    }
    attn_procs[name] = IPAttnProcessor(hidden_size=hidden_size, cross_attention_dim=cross_attention_dim)
    attn_procs[name].load_state_dict(weights)

So, I want to know why it is set like this, and how should I modify my inference code so that the code does not have such warnings?

xiaohu2015 commented 1 year ago

@xyxxmb hi, you can convert the weights by the following code:

import torch
ckpt = "checkpoint-50000/pytorch_model.bin"
sd = torch.load(p, map_location="cpu")
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("unet"):
        pass
    elif k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif k.startswith("adapter_modules"):
        ip_sd[k.replace("adapter_modules.", "")] = sd[k]

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin")

then, you can use our demo code to load the trained model.

xyxxmb commented 1 year ago

@xyxxmb hi, you can convert the weights by the following code:

import torch
ckpt = "checkpoint-50000/pytorch_model.bin"
sd = torch.load(p, map_location="cpu")
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("unet"):
        pass
    elif k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif k.startswith("adapter_modules"):
        ip_sd[k.replace("adapter_modules.", "")] = sd[k]

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin")

then, you can use our demo code to load the trained model.

Thanks, but in training code, the unet is also be trained. Why not save unet?

xiaohu2015 commented 1 year ago

@xyxxmb hi, the unet is not trained:

https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L255 https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L305

we only add unet to IPAdapter for mixed-precision training.

xyxxmb commented 1 year ago

@xyxxmb hi, the unet is not trained:

https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L257 https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L307

Sorry, I only see Line 325: https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L325

I want to know why the code set these extra param for unet: https://github.com/tencent-ailab/IP-Adapter/blob/main/tutorial_train.py#L284 ~ L290

xiaohu2015 commented 1 year ago

@xyxxmb hi, these extra params are added for cross-attention for image features. In diffusers, we can reset attention processor to add new params, the training code of LORA also uses this trick: https://github.com/huggingface/diffusers/blob/main/examples/text_to_image/train_text_to_image_lora.py

Muchoww commented 1 year ago

Hi, this error happened, what should I do, thanks RuntimeError: Error(s) in loading state_dict for Resampler: Missing key(s) in state_dict: "latents", "proj_in.weight", "proj_in.bias", "proj_out.weight", "proj_out.bias", "norm_out.weight", "norm_out.bias", "layers.0.0.norm1.weight", "layers.0.0.norm1.bias", "layers.0.0.norm2.weight", "layers.0.0.norm2.bias", "layers.0.0.to_q.weight", "layers.0.0.to_kv.weight", "layers.0.0.to_out.weight", "layers.0.1.0.weight", "layers.0.1.0.bias", "layers.0.1.1.weight", "layers.0.1.3.weight", "layers.1.0.norm1.weight", "layers.1.0.norm1.bias", "layers.1.0.norm2.weight", "layers.1.0.norm2.bias", "layers.1.0.to_q.weight", "layers.1.0.to_kv.weight", "layers.1.0.to_out.weight", "layers.1.1.0.weight", "layers.1.1.0.bias", "layers.1.1.1.weight", "layers.1.1.3.weight", "layers.2.0.norm1.weight", "layers.2.0.norm1.bias", "layers.2.0.norm2.weight", "layers.2.0.norm2.bias", "layers.2.0.to_q.weight", "layers.2.0.to_kv.weight", "layers.2.0.to_out.weight", "layers.2.1.0.weight", "layers.2.1.0.bias", "layers.2.1.1.weight", "layers.2.1.3.weight", "layers.3.0.norm1.weight", "layers.3.0.norm1.bias", "layers.3.0.norm2.weight", "layers.3.0.norm2.bias", "layers.3.0.to_q.weight", "layers.3.0.to_kv.weight", "layers.3.0.to_out.weight", "layers.3.1.0.weight", "layers.3.1.0.bias", "layers.3.1.1.weight", "layers.3.1.3.weight". Unexpected key(s) in state_dict: "proj.weight", "proj.bias", "norm.weight", "norm.bias".

xiaohu2015 commented 1 year ago

@Muchoww it seems that you load a wrong weight (the weight for IP-Adapter) for IP-Adapter-Plus model.

Muchoww commented 1 year ago

@Muchoww it seems that you load a wrong weight (the weight for IP-Adapter) for IP-Adapter-Plus model.

oh, thanks, how to train ip-adapter-plus? is there a demo?

xiaohu2015 commented 1 year ago

@Muchoww it seems that you load a wrong weight (the weight for IP-Adapter) for IP-Adapter-Plus model.

oh, thanks, how to train ip-adapter-plus? is there a demo?

https://github.com/tencent-ailab/IP-Adapter/issues/4

h3clikejava commented 11 months ago

@xyxxmb hi, you can convert the weights by the following code:

import torch
ckpt = "checkpoint-50000/pytorch_model.bin"
sd = torch.load(p, map_location="cpu")
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("unet"):
        pass
    elif k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif k.startswith("adapter_modules"):
        ip_sd[k.replace("adapter_modules.", "")] = sd[k]

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin")

then, you can use our demo code to load the trained model.

Why is the ip-adapter.bin file generated using this method 89.3M instead of 44M? I only used 9 images during the training. What training factors affect the size of this file? My training script is:

accelerate launch --num_processes 8 --mixed_precision "fp16"   tutorial_train.py   --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"   --image_encoder_path="./models/image_encoder"   --data_json_file="./train.json"   --data_root_path="./result"  --resolution=512   --train_batch_size=4   --dataloader_num_workers=4   --learning_rate=1e-04   --weight_decay=0.01   --output_dir="./output"   --save_steps=100
xiaohu2015 commented 11 months ago

@h3clikejava my weight is fp16 ( I used deepspeed fp16, and only can get fp16 weights)

chuck-ma commented 10 months ago

@xyxxmb hi, you can convert the weights by the following code:

import torch
ckpt = "checkpoint-50000/pytorch_model.bin"
sd = torch.load(p, map_location="cpu")
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("unet"):
        pass
    elif k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif k.startswith("adapter_modules"):
        ip_sd[k.replace("adapter_modules.", "")] = sd[k]

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin")

then, you can use our demo code to load the trained model.

It seems that sd should be loaded as the following code:

sd = torch.load(ckpt, map_location="cpu")

xiaohu2015 commented 10 months ago

@xyxxmb hi, you can convert the weights by the following code:

import torch
ckpt = "checkpoint-50000/pytorch_model.bin"
sd = torch.load(p, map_location="cpu")
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("unet"):
        pass
    elif k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif k.startswith("adapter_modules"):
        ip_sd[k.replace("adapter_modules.", "")] = sd[k]

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "ip_adapter.bin")

then, you can use our demo code to load the trained model.

It seems that sd should be loaded as the following code:

sd = torch.load(ckpt, map_location="cpu")

yes!