tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.46k stars 289 forks source link

Error in IP-Adapter Portrait when using list of prompts #372

Closed aycaecemgul closed 1 month ago

aycaecemgul commented 1 month ago
def get_face_embeds(images):
  faceid_embeds = []
  face_images = []

  for image_path in images:
      image = cv2.imread(image_path)
      if image is not None:
        faces = app.get(image)
        if faces is not None:
          faceid_embeds.append(torch.from_numpy(faces[0].normed_embedding).unsqueeze(0).unsqueeze(0))
          face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224) # you can also segment the face
          face_images.append(face_image)
        else:
          print("No face found!")
      else:
        print(f"Could not read img: {image_path}")

  if len(face_images) != len(images):
    return None, None
  else:
    faceid_embeds = torch.cat(faceid_embeds, dim=1)

  return faceid_embeds, face_images

images = [
          "IMG_0122.jpg",
          "IMG_2042.jpg",
          "IMG_2043.jpg",
          "IMG_2034.jpg"
          ]
faceid_embeds, face_images = get_face_embeds(images)

prompt = "portrait photo of a beautiful woman"
negative_prompt = "blurry, ugly"

prompts = [prompt, prompt, prompt, prompt]
negative_prompts = [negative_prompt,negative_prompt,negative_prompt,negative_prompt]

result = ip_model.generate(
     prompt=prompts, negative_prompt=negative_prompts, faceid_embeds=faceid_embeds, shortcut=True, s_scale=0.1,
     num_samples=4, width=512, height=768, num_inference_steps=40)

The error I am getting:

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-20-09358eab019b>] in <cell line: 10>()
      8 negative_prompt = [negative_prompt,negative_prompt,negative_prompt,negative_prompt]
      9 # faceid_embeds_list = [faceid_embeds, faceid_embeds, faceid_embeds, faceid_embeds]
---> 10 result = ip_model.generate(
     11      prompt=prompts, negative_prompt=negative_prompts, faceid_embeds=faceid_embeds, shortcut=True, s_scale=0.1,
     12      num_samples=4, width=512, height=768, num_inference_steps=40)

[/usr/local/lib/python3.10/dist-packages/ip_adapter/ip_adapter_faceid_separate.py] in generate(self, faceid_embeds, prompt, negative_prompt, scale, num_samples, seed, guidance_scale, num_inference_steps, **kwargs)
    244                 negative_prompt=negative_prompt,
    245             )
--> 246             prompt_embeds = torch.cat([prompt_embeds_, image_prompt_embeds], dim=1)
    247             negative_prompt_embeds = torch.cat([negative_prompt_embeds_, uncond_image_prompt_embeds], dim=1)
    248 

RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 16 but got size 4 for tensor number 1 in the list.

I tried giving embeds as a list but did not work, should I concat the embed 4 times? I think the Ip adapter pipeline should handle this.

aycaecemgul commented 1 month ago

I reduced the num_samples to 1 and repeated face embeds instead and it worked!

face_embeds = face_embedding.repeat(num_samples, 1, 1)
pipeline.generate(
        prompt=positive_prompts,
        negative_prompt=negative_prompts,
        faceid_embeds=face_embeds,
        shortcut=True,
        num_samples=1,
        seed=seed_list,
        width=512,
        height=768,
        s_scale=1.0,
        num_inference_steps=40,
    )

Do you think we implement this inside repo? @xiaohu2015

xiaohu2015 commented 1 month ago

hi, you can make a PR?

aycaecemgul commented 1 month ago

sure I'll look into it