We are building Refiners, an open source, PyTorch-based framework made to easily train and run adapters on top of foundational models. Just wanted to let you know that IP-Adapter is now fully supported in Refiners! (congrats on the great work, by the way!!)
E.g. an equivalent to the "IP-Adapter with fine-grained features" demo would look like this:
import torch
from PIL import Image
from refiners.foundationals.latent_diffusion import StableDiffusion_1, SD1IPAdapter
from refiners.foundationals.latent_diffusion.schedulers import DDIM
from refiners.fluxion.utils import load_from_safetensors, manual_seed
device = "cuda"
image = Image.open("statue.png")
ddim_scheduler = DDIM(num_inference_steps=50)
sd15 = StableDiffusion_1(scheduler=ddim_scheduler, device="cuda", dtype=torch.float16)
sd15.clip_text_encoder.load_from_safetensors("clip_text.safetensors")
sd15.lda.load_from_safetensors("lda.safetensors")
sd15.unet.load_from_safetensors("unet.safetensors")
with torch.no_grad():
prompt = "best quality, high quality, wearing a hat on the beach"
negative_prompt = "monochrome, lowres, bad anatomy, worst quality, low quality"
ip_adapter = SD1IPAdapter(
target=sd15.unet,
weights=load_from_safetensors("ip-adapter-plus_sd15.safetensors"),
fine_grained=True,
scale=0.6,
)
ip_adapter.clip_image_encoder.load_from_safetensors("clip_image.safetensors")
ip_adapter.inject()
clip_text_embedding = sd15.compute_clip_text_embedding(text=prompt, negative_text=negative_prompt)
clip_image_embedding = ip_adapter.compute_clip_image_embedding(ip_adapter.preprocess_image(image))
negative_text_embedding, conditional_text_embedding = clip_text_embedding.chunk(2)
negative_image_embedding, conditional_image_embedding = clip_image_embedding.chunk(2)
clip_text_embedding = torch.cat(
(
torch.cat([negative_text_embedding, negative_image_embedding], dim=1),
torch.cat([conditional_text_embedding, conditional_image_embedding], dim=1),
)
)
manual_seed(42)
x = torch.randn(1, 4, 64, 64, device=device, dtype=torch.float16)
for step in sd15.steps:
x = sd15(
x,
step=step,
clip_text_embedding=clip_text_embedding,
condition_scale=7.5,
)
predicted_image = sd15.lda.decode_latents(x)
predicted_image.save("output.png")
print("done: see output.png")
Note: other variants of IP-Adapter are supported too (SDXL, with or without fine-grained features)
A few more things:
SD1IPAdapter implements the IP-Adapter logic: it “targets” the UNet on which it can be injected (= all cross-attentions are replaced with the decoupled cross-attentions) or ejected (= get back to the original UNet)
We are building Refiners, an open source, PyTorch-based framework made to easily train and run adapters on top of foundational models. Just wanted to let you know that IP-Adapter is now fully supported in Refiners! (congrats on the great work, by the way!!)
E.g. an equivalent to the "IP-Adapter with fine-grained features" demo would look like this:
A few more things:
SD1IPAdapter
implements the IP-Adapter logic: it “targets” the UNet on which it can be injected (= all cross-attentions are replaced with the decoupled cross-attentions) or ejected (= get back to the original UNet)Feedback welcome!