Finally took the time to implement the missing features of the diffusers implementation. I also simplified code and streamlined the workflow. You only have one main IPAdapter class (not one for each model) that takes care of everything.
The core of the execution looks like this
# ...
reference = Image.open("reference_image.jpg")
# doesn't matter what model you are using, they all use the IPAdapter class
ip_adapter = IPAdapter(pipe, "ipdapter/model/path", "image/encoder/path", device=device)
# exports the text+image embeds
prompt_embeds, negative_prompt_embeds = ip_adapter.get_prompt_embeds(
reference,
prompt="positive prompt",
negative_prompt="blurry,",
)
# use the pipe as always attaching the exported embeds
image = pipe(
prompt_embeds=prompt_embeds,
negative_prompt_embeds=negative_prompt_embeds,
num_inference_steps=30,
guidance_scale=6.0,
generator=generator,
).images[0]
image.save("image.webp", lossless=True, quality=100)
It is also possible to send negative images, this is important as per my experimentation you can sometimes get better results. You can use any image as negative but it seems to work better with just random noise. This is an example:
reference (no noise)
gaussian noise
mandelbrot noise
And of course you can give a weight if you send multiple images, but I'm sure there's some better normalization that could be done (but it kinda works).
Please note that I don't have much experience with diffusers, not sure what are the best practices and the code structure might change in coming days. Any feedback is welcome
Finally took the time to implement the missing features of the diffusers implementation. I also simplified code and streamlined the workflow. You only have one main
IPAdapter
class (not one for each model) that takes care of everything.The core of the execution looks like this
To send multiple images is as simple as:
It is also possible to send negative images, this is important as per my experimentation you can sometimes get better results. You can use any image as negative but it seems to work better with just random noise. This is an example:
And of course you can give a weight if you send multiple images, but I'm sure there's some better normalization that could be done (but it kinda works).
Please note that I don't have much experience with diffusers, not sure what are the best practices and the code structure might change in coming days. Any feedback is welcome
You can find the MIT licensed code and a lot of the examples here: https://github.com/cubiq/Diffusers_IPAdapter
Let me thank again Tencent AILab for making the IPAdapter models public.