tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.48k stars 293 forks source link

Size mistmatch in sdxl plus-face #317

Open Jeffman112 opened 3 months ago

Jeffman112 commented 3 months ago

Hi, I installed IP Adapter and ran the demo notebook on my pc and on Google Colab https://github.com/tencent-ailab/IP-Adapter/blob/main/ip_adapter_sdxl_plus-face_demo.ipynb and it gave me the error

Traceback (most recent call last):
  File "D:\Portrait Generator\Photomaker\download.py", line 16, in <module>
    ip_model = IPAdapterPlusXL(pipe, image_encoder_path, ip_ckpt, device, num_tokens=16)
  File "C:\Users\TheGoat\AppData\Local\Programs\Python\Python310\lib\site-packages\ip_adapter\ip_adapter.py", line 83, in __init__
    self.load_ip_adapter()
  File "C:\Users\TheGoat\AppData\Local\Programs\Python\Python310\lib\site-packages\ip_adapter\ip_adapter.py", line 136, in load_ip_adapter
    ip_layers.load_state_dict(state_dict["ip_adapter"])
  File "C:\Users\TheGoat\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ModuleList:
        Unexpected key(s) in state_dict: "69.to_k_ip.weight", "69.to_v_ip.weight", "71.to_k_ip.weight", "71.to_v_ip.weight", "73.to_k_ip.weight", "73.to_v_ip.weight", "75.to_k_ip.weight", "75.to_v_ip.weight", "77.to_k_ip.weight", "77.to_v_ip.weight", "79.to_k_ip.weight", "79.to_v_ip.weight", "81.to_k_ip.weight", "81.to_v_ip.weight", "83.to_k_ip.weight", "83.to_v_ip.weight", "85.to_k_ip.weight", "85.to_v_ip.weight", "87.to_k_ip.weight", "87.to_v_ip.weight", "89.to_k_ip.weight", "89.to_v_ip.weight", "91.to_k_ip.weight", "91.to_v_ip.weight", "93.to_k_ip.weight", "93.to_v_ip.weight", "95.to_k_ip.weight", "95.to_v_ip.weight", "97.to_k_ip.weight", "97.to_v_ip.weight", "99.to_k_ip.weight", "99.to_v_ip.weight", "101.to_k_ip.weight", "101.to_v_ip.weight", "103.to_k_ip.weight", "103.to_v_ip.weight", "105.to_k_ip.weight", "105.to_v_ip.weight", "107.to_k_ip.weight", "107.to_v_ip.weight", "109.to_k_ip.weight", "109.to_v_ip.weight", "111.to_k_ip.weight", "111.to_v_ip.weight", "113.to_k_ip.weight", "113.to_v_ip.weight", "115.to_k_ip.weight", "115.to_v_ip.weight", "117.to_k_ip.weight", "117.to_v_ip.weight", "119.to_k_ip.weight", "119.to_v_ip.weight", "121.to_k_ip.weight", "121.to_v_ip.weight", "123.to_k_ip.weight", "123.to_v_ip.weight", "125.to_k_ip.weight", "125.to_v_ip.weight", "127.to_k_ip.weight", "127.to_v_ip.weight", "129.to_k_ip.weight", "129.to_v_ip.weight", "131.to_k_ip.weight", "131.to_v_ip.weight", "133.to_k_ip.weight", "133.to_v_ip.weight", "135.to_k_ip.weight", "135.to_v_ip.weight", "137.to_k_ip.weight", "137.to_v_ip.weight", "139.to_k_ip.weight", "139.to_v_ip.weight".
        size mismatch for 61.to_k_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 61.to_v_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 63.to_k_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 63.to_v_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 65.to_k_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 65.to_v_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 67.to_k_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).
        size mismatch for 67.to_v_ip.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([640, 2048]).

on loading the IPAdapterPlusXL model. I also tried just copy pasting the code into a normal python script, and tried swapping the image encoder between SDXL and regular Here is my code as a script

import torch
from diffusers import DDIMScheduler
from PIL import Image
from ip_adapter import IPAdapterPlusXL
from ip_adapter.custom_pipelines import StableDiffusionXLCustomPipeline
base_model_path = "segmind/SSD-1B"
image_encoder_path = "models/image_encoder"
ip_ckpt = "sdxl_models/ip-adapter-plus-face_sdxl_vit-h.bin"
device = "cuda"
pipe = StableDiffusionXLCustomPipeline.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,
    add_watermarker=False,
)
image = Image.open("matt.jpg")
ip_model = IPAdapterPlusXL(pipe, image_encoder_path, ip_ckpt, device, num_tokens=16)
images = ip_model.generate(pil_image=image, num_samples=1, num_inference_steps=50, prompt="photo of a man on a surf board")
images[0].save("output.jpg")
Jeffman112 commented 3 months ago

It works fine with normal SDXL (1.0 base) but not with SSD-1b. Is there any way to make it work with this for the speed up and lower compute power?