tencent-ailab / IP-Adapter

The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
Apache License 2.0
5.21k stars 337 forks source link

implementation for multiple images, image weighting and negative embeds #99

Open cubiq opened 1 year ago

cubiq commented 1 year ago

Finally took the time to implement the missing features of the diffusers implementation. I also simplified code and streamlined the workflow. You only have one main IPAdapter class (not one for each model) that takes care of everything.

The core of the execution looks like this

# ...

reference = Image.open("reference_image.jpg")

# doesn't matter what model you are using, they all use the IPAdapter class
ip_adapter = IPAdapter(pipe, "ipdapter/model/path", "image/encoder/path", device=device)

# exports the text+image embeds
prompt_embeds, negative_prompt_embeds = ip_adapter.get_prompt_embeds(
    reference,
    prompt="positive prompt",
    negative_prompt="blurry,",
)

# use the pipe as always attaching the exported embeds
image = pipe(
    prompt_embeds=prompt_embeds,
    negative_prompt_embeds=negative_prompt_embeds,
    num_inference_steps=30,
    guidance_scale=6.0,
    generator=generator,
).images[0]
image.save("image.webp", lossless=True, quality=100)

To send multiple images is as simple as:

reference1 = Image.open("reference_image_1.jpg")
reference2 = Image.open("reference_image_2.jpg")

prompt_embeds, negative_prompt_embeds = ip_adapter.get_prompt_embeds(
    [reference1, reference2],
    prompt="positive prompt",
    negative_prompt="blurry,",
)

It is also possible to send negative images, this is important as per my experimentation you can sometimes get better results. You can use any image as negative but it seems to work better with just random noise. This is an example:

reference (no noise) gaussian noise mandelbrot noise
ipadapter_new ipadapter_noise ipadapter_mandelbrot

And of course you can give a weight if you send multiple images, but I'm sure there's some better normalization that could be done (but it kinda works).

Please note that I don't have much experience with diffusers, not sure what are the best practices and the code structure might change in coming days. Any feedback is welcome

You can find the MIT licensed code and a lot of the examples here: https://github.com/cubiq/Diffusers_IPAdapter

Let me thank again Tencent AILab for making the IPAdapter models public.

xiaohu2015 commented 1 year ago

@cubiq That's great. Thank you very much for your contribution to IP-Adapter.