tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.26k stars 336 forks source link

model loading errors when testing a trained model #168

Open qpc1611094 opened 11 months ago

qpc1611094 commented 11 months ago

Hi, I have trained a new model but meet errors when testing, I did it as:

  1. train a model with:

    accelerate launch --num_processes 2 --multi_gpu --mixed_precision "fp16" \
    tutorial_train.py \
    --pretrained_model_name_or_path="stable-diffusion-v1-5/" \
    --image_encoder_path="image_encoder" \
    --data_json_file="ffhq_data.json" \
    --data_root_path="" \
    --mixed_precision="fp16" \
    --resolution=512 \
    --train_batch_size=4 \
    --dataloader_num_workers=4 \
    --learning_rate=1e-04 \
    --weight_decay=0.01 \
    --output_dir="output_dir/ffhq" \
    --save_steps=2000 \
    --num_train_epochs=1

    the image_encoder is download from your provided link, and this can work

  2. convert weight with:

    
    import torch
    from safetensors.torch import load_file
    ckpt = "output_dir/ffhq/checkpoint-2000/model.safetensors"
    sd = load_file(ckpt)
    image_proj_sd = {}
    ip_sd = {}
    for k in sd:
    if k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif "_ip." in k: 
        ip_sd[k.replace("unet.", "")] = sd[k] 

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "output_dir/ffhq/checkpoint-2000/ip_adapter.bin")

this also can work

3. test the trained model with:

base_model_path = "stable-diffusion-v1-5" vae_model_path = "sd-vae-ft-mse" image_encoder_path = "image_encoder" ip_ckpt = "output_dir/ffhq/checkpoint-2000/ip_adapter.bin" device = "cuda"

def image_grid(imgs, rows, cols): assert len(imgs) == rows*cols

w, h = imgs[0].size
grid = Image.new('RGB', size=(cols*w, rows*h))
grid_w, grid_h = grid.size

for i, img in enumerate(imgs):
    grid.paste(img, box=(i%cols*w, i//cols*h))
return grid

noise_scheduler = DDIMScheduler( num_train_timesteps=1000, beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, steps_offset=1, ) vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)

pipe = StableDiffusionPipeline.from_pretrained( base_model_path, torch_dtype=torch.float16, scheduler=noise_scheduler, vae=vae, feature_extractor=None, safety_checker=None )

ip_model = IPAdapter(pipe, image_encoder_path, ip_ckpt, device)


**errors happen at the last line (ip_model cannot loaded):**
_Unexpected key(s) in state_dict: "down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight"._

I tried none of IPAdapter, IPAdapterFull, IPAdapterPlus, IPAdapterXL, IPAdapterPlusXL can load successfully.
Need your help! Thanks!
qpc1611094 commented 11 months ago

also erros of: Missing key(s) in state_dict: "1.to_k_ip.weight", "1.to_v_ip.weight", "3.to_k_ip.weight", "3.to_v_ip.weight", "5.to_k_ip.weight", "5.to_v_ip.weight", "7.to_k_ip.weight", "7.to_v_ip.weight", "9.to_k_ip.weight", "9.to_v_ip.weight", "11.to_k_ip.weight", "11.to_v_ip.weight", "13.to_k_ip.weight", "13.to_v_ip.weight", "15.to_k_ip.weight", "15.to_v_ip.weight", "17.to_k_ip.weight", "17.to_v_ip.weight", "19.to_k_ip.weight", "19.to_v_ip.weight", "21.to_k_ip.weight", "21.to_v_ip.weight", "23.to_k_ip.weight", "23.to_v_ip.weight", "25.to_k_ip.weight", "25.to_v_ip.weight", "27.to_k_ip.weight", "27.to_v_ip.weight", "29.to_k_ip.weight", "29.to_v_ip.weight", "31.to_k_ip.weight", "31.to_v_ip.weight".

xiaohu2015 commented 11 months ago

names_1 = ['down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight']

names_2 = [
"1.to_k_ip.weight", "1.to_v_ip.weight", "3.to_k_ip.weight", "3.to_v_ip.weight", "5.to_k_ip.weight", "5.to_v_ip.weight", "7.to_k_ip.weight", "7.to_v_ip.weight", "9.to_k_ip.weight", "9.to_v_ip.weight", "11.to_k_ip.weight", "11.to_v_ip.weight", "13.to_k_ip.weight", "13.to_v_ip.weight", "15.to_k_ip.weight", "15.to_v_ip.weight", "17.to_k_ip.weight", "17.to_v_ip.weight", "19.to_k_ip.weight", "19.to_v_ip.weight", "21.to_k_ip.weight", "21.to_v_ip.weight", "23.to_k_ip.weight", "23.to_v_ip.weight", "25.to_k_ip.weight", "25.to_v_ip.weight", "27.to_k_ip.weight", "27.to_v_ip.weight", "29.to_k_ip.weight", "29.to_v_ip.weight", "31.to_k_ip.weight", "31.to_v_ip.weight"
]

mapping = {k: v for k, v in zip(names_1, names_2)}

import torch
from safetensors.torch import load_file
ckpt = "output_dir/ffhq/checkpoint-2000/model.safetensors"
sd = load_file(ckpt)
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif "_ip." in k: 
        ip_sd[mapping[k.replace("unet.", "")]] = sd[k] 

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "output_dir/ffhq/checkpoint-2000/ip_adapter.bin")
qpc1611094 commented 11 months ago

very thanks, there is still a little bug: _RuntimeError: Error(s) in loading state_dict for ModuleList: size mismatch for 19.to_k_ip.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for 19.to_v_ip.weight: copying a param with shape torch.Size([1280, 768]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for 25.to_k_ip.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for 25.to_v_ip.weight: copying a param with shape torch.Size([640, 768]) from checkpoint, the shape in current model is torch.Size([320, 768]). size mismatch for 31.to_k_ip.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for 31.to_vip.weight: copying a param with shape torch.Size([320, 768]) from checkpoint, the shape in current model is torch.Size([1280, 768]).

names_1 = [
"down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight"
]

names_ 2 = [
"1.to_k_ip.weight", "1.to_v_ip.weight", "3.to_k_ip.weight", "3.to_v_ip.weight", "5.to_k_ip.weight", "5.to_v_ip.weight", "7.to_k_ip.weight", "7.to_v_ip.weight", "9.to_k_ip.weight", "9.to_v_ip.weight", "11.to_k_ip.weight", "11.to_v_ip.weight", "13.to_k_ip.weight", "13.to_v_ip.weight", "15.to_k_ip.weight", "15.to_v_ip.weight", "17.to_k_ip.weight", "17.to_v_ip.weight", "19.to_k_ip.weight", "19.to_v_ip.weight", "21.to_k_ip.weight", "21.to_v_ip.weight", "23.to_k_ip.weight", "23.to_v_ip.weight", "25.to_k_ip.weight", "25.to_v_ip.weight", "27.to_k_ip.weight", "27.to_v_ip.weight", "29.to_k_ip.weight", "29.to_v_ip.weight", "31.to_k_ip.weight", "31.to_v_ip.weight"
]

mapping = {k: v for k, v in zip(names_1, names_2)}

import torch
from safetensors.torch import load_file
ckpt = "output_dir/ffhq/checkpoint-2000/model.safetensors"
sd = load_file(ckpt)
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif "_ip." in k: 
        ip_sd[mapping[k.replace("unet.", "")]] = sd[k] 

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "output_dir/ffhq/checkpoint-2000/ip_adapter.bin")
RichFrain commented 11 months ago

I'm having the same problem as you. image

xiaohu2015 commented 11 months ago
names_1 = ['down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.0.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.0.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'down_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.1.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.1.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.1.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.2.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.2.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.2.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.3.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.3.attentions.1.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'up_blocks.3.attentions.2.transformer_blocks.0.attn2.processor.to_v_ip.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_k_ip.weight', 'mid_block.attentions.0.transformer_blocks.0.attn2.processor.to_v_ip.weight']

names_2 = [
"1.to_k_ip.weight", "1.to_v_ip.weight", "3.to_k_ip.weight", "3.to_v_ip.weight", "5.to_k_ip.weight", "5.to_v_ip.weight", "7.to_k_ip.weight", "7.to_v_ip.weight", "9.to_k_ip.weight", "9.to_v_ip.weight", "11.to_k_ip.weight", "11.to_v_ip.weight", "13.to_k_ip.weight", "13.to_v_ip.weight", "15.to_k_ip.weight", "15.to_v_ip.weight", "17.to_k_ip.weight", "17.to_v_ip.weight", "19.to_k_ip.weight", "19.to_v_ip.weight", "21.to_k_ip.weight", "21.to_v_ip.weight", "23.to_k_ip.weight", "23.to_v_ip.weight", "25.to_k_ip.weight", "25.to_v_ip.weight", "27.to_k_ip.weight", "27.to_v_ip.weight", "29.to_k_ip.weight", "29.to_v_ip.weight", "31.to_k_ip.weight", "31.to_v_ip.weight"
]

mapping = {k: v for k, v in zip(names_1, names_2)}

import torch
from safetensors.torch import load_file
ckpt = "output_dir/ffhq/checkpoint-2000/model.safetensors"
sd = load_file(ckpt)
image_proj_sd = {}
ip_sd = {}
for k in sd:
    if k.startswith("image_proj_model"):
        image_proj_sd[k.replace("image_proj_model.", "")] = sd[k]
    elif "_ip." in k: 
        ip_sd[mapping[k.replace("unet.", "")]] = sd[k] 

torch.save({"image_proj": image_proj_sd, "ip_adapter": ip_sd}, "output_dir/ffhq/checkpoint-2000/ip_adapter.bin")

I updated the script, can you try again?

qpc1611094 commented 11 months ago

it works, thanks

RichFrain commented 11 months ago

it works, thanks so much

AbhinavJangra29 commented 2 months ago

image

riolys commented 2 months ago

image

Have your solved this?

hwj05140514 commented 1 month ago

image

I have the same issue with you. Have you solved this?