tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
4.46k stars 289 forks source link

Loading faceid plus lora gives strange outputs #351

Open Fqlox opened 1 month ago

Fqlox commented 1 month ago

I wanted to use the faceid plusv2 lora with the Ipadater faceid plusv2.

Here is a simplier test : I'm using RealVisv40 Lightning

import torch
from diffusers import StableDiffusionXLPipeline
from PIL import Image
import cv2
from diffusers import DPMSolverSinglestepScheduler, DDIMScheduler, DPMSolverMultistepScheduler
from insightface.app import FaceAnalysis

pipe = StableDiffusionXLPipeline.from_pretrained(
    "models/StableDiffusion/RealvisXLv40_lightning", #Local dir
    torch_dtype=torch.float16,
    #requires_safety_checker=True

).to("cuda")

# DMP++ 2M
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)

# Local dir
pipe.load_lora_weights("models/Lora", weight_name="ip-adapter-faceid-plusv2_sdxl_lora.safetensors", adapter_name="faceid")
pipe.fuse_lora(lora_scale=1.0)

from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlusXL
image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"

ip_model = IPAdapterFaceIDPlusXL(pipe, image_encoder_path, "models/ipAdapter/ip-adapter-faceid-plusv2_sdxl.bin" , "cuda")

###############################################
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

images = ["img.png"]

faceid_embeds = []
for image in images:
    image = cv2.imread(image)
    faces = app.get(image)
    faceid_embeds.append(torch.from_numpy(faces[0].normed_embedding).unsqueeze(0).unsqueeze(0))

faceid_embeds = torch.cat(faceid_embeds, dim=1)

################################################

prompt = "A man smiling"

negative_prompt = "nude, hand, disfigured"

input_image = Image.open("img.png")

output = ip_model.generate(
    guidance_scale=1.5,
    face_image=input_image,
    scale=1.2,
    prompt=prompt,
    negative_prompt=negative_prompt,
    faceid_embeds=faceid_embeds,
    num_samples=1,
    width=512, 
    height=768,
    num_inference_steps=20, 
    seed=10
)
output[0].show()

I tried multiple scheduler as well.

I got this image instead :

tmptgn8ay1j

alexblattner commented 1 month ago

it's broken, don't use the lora