Invertible Consistency Distillation for
Text-Guided Image Editing in Around 7 Steps

This paper proposes invertible Consistency Distillation, enabling

highly efficient and accurate text-guided image editing
diverse and high-quality image generation

Installation
Easy-to-run examples (iCD-SD1.5)
- Generation
- Editing
Easy-to-run examples (iCD-SDXL)
In-depth generation and editing (iCD-SDXL and iCD-SD1.5)
iCD training example (iCD-SDXL and iCD-SD1.5)
Citation

Installation

# Clone a repo
git clone https://github.com/yandex-research/invertible-cd

# Create an environment and install packages
conda create -n icd python=3.10 -y 
conda activate icd

pip3 install -r requirements/req.txt

We provide the following checkpoints:

Guidance distilled diffusion models
- Stable Diffusion 1.5, 3GB
- SDXL, 8.9GB

These models saved as .pt files.

Invertible Consistency Distillation (forward and reverse CD) on top of the guidance distilled models

Model	Steps	Time steps
iCD-SD1.5, 0.5GB	4	Reverse: [259, 519, 779, 999]; Forward: [19, 259, 519, 779]
iCD-SD1.5, 0.5GB	4	Reverse: [249, 499, 699, 999]; Forward: [19, 249, 499, 699]
iCD-SD1.5, 0.5GB	3	Reverse: [339, 699, 999]; Forward: [19, 339, 699]
iCD-SDXL, 1.4GB	4	Reverse: [259, 519, 779, 999]; Forward: [19, 259, 519, 779]
iCD-SDXL, 1.4GB	4	Reverse: [249, 499, 699, 999]; Forward: [19, 249, 499, 699]
iCD-SDXL, 1.4GB	3	Reverse: [339, 699, 999]; Forward: [19, 339, 699]

These models saved as .safetensors files.

Easy-to-run examples

Step 0. Download the models and put them to the checkpoints folder

For this example, we consider iCD-SD1.5 using reverse: [259, 519, 779, 999], forward: [19, 259, 519, 779] time steps.

Step 1. Load the models

from utils.loading import load_models
from diffusers import DDPMScheduler

root = 'checkpoints'
ldm_stable, reverse_cons_model, forward_cons_model = load_models(
    model_id="runwayml/stable-diffusion-v1-5",
    device='cuda',
    forward_checkpoint=f'{root}/iCD-SD15-forward_19_259_519_779.safetensors',
    reverse_checkpoint=f'{root}/iCD-SD15-reverse_259_519_779_999.safetensors',
    r=64,
    w_embed_dim=512,
    teacher_checkpoint=f'{root}/sd15_cfg_distill.pt',
)

tokenizer = ldm_stable.tokenizer
noise_scheduler = DDPMScheduler.from_pretrained(
    "runwayml/stable-diffusion-v1-5", subfolder="scheduler", )

Step 2. Specify the configuration according to the downloaded model

from utils import p2p, generation

NUM_REVERSE_CONS_STEPS = 4
REVERSE_TIMESTEPS = [259, 519, 779, 999]
NUM_FORWARD_CONS_STEPS = 4
FORWARD_TIMESTEPS = [19, 259, 519, 779]
NUM_DDIM_STEPS = 50

solver = generation.Generator(
    model=ldm_stable,
    noise_scheduler=noise_scheduler,
    n_steps=NUM_DDIM_STEPS,
    forward_cons_model=forward_cons_model,
    forward_timesteps=FORWARD_TIMESTEPS,
    reverse_cons_model=reverse_cons_model,
    reverse_timesteps=REVERSE_TIMESTEPS,
    num_endpoints=NUM_REVERSE_CONS_STEPS,
    num_forward_endpoints=NUM_FORWARD_CONS_STEPS,
    max_forward_timestep_index=49,
    start_timestep=19)

p2p.NUM_DDIM_STEPS = NUM_DDIM_STEPS
p2p.tokenizer = tokenizer
p2p.device = 'cuda'

Generation with iCD-SD1.5

Step 3. Generate

import torch

prompt = ['a cute owl with a graduation cap']
controller = p2p.AttentionStore()

generator = torch.Generator().manual_seed(150)
tau = 1.0
image, _ = generation.runner(
    # Playing params
    guidance_scale=19.0,
    tau1=tau,  # Dynamic guidance if tau < 1.0
    tau2=tau,

    # Fixed params
    is_cons_forward=True,
    model=reverse_cons_model,
    w_embed_dim=512,
    solver=solver,
    prompt=prompt,
    controller=controller,
    generator=generator,
    latent=None,
    return_type='image')

# The left image is inversion, the right - editing.
generation.to_pil_images(image).save('test_generation_iCD-SD1.5.jpg')
generation.view_images(image)

Editing with iCD-SD1.5

Step 3. Load and invert real image

from utils import inversion

image_path = f"assets/bird.jpg"
prompt = ["a photo of a bird standing on a branch"]

(image_gt, image_rec), ddim_latent, uncond_embeddings = inversion.invert(
    # Playing params
    image_path=image_path,
    prompt=prompt,

    # Fixed params
    is_cons_inversion=True,
    w_embed_dim=512,
    inv_guidance_scale=0.0,
    stop_step=50,
    solver=solver,
    seed=10500)

Step 4. Edit the image

p2p.NUM_DDIM_STEPS = 4
p2p.tokenizer = tokenizer
p2p.device = 'cuda'

prompts = ["a photo of a bird standing on a branch",
           "a photo of a lego bird standing on a branch"
           ]

# Playing params
cross_replace_steps = {'default_': 0.2, }
self_replace_steps = 0.2
blend_word = ((('bird',), ('lego',)))
eq_params = {"words": ("lego",), "values": (3.,)}

controller = p2p.make_controller(prompts,
                                 False, # (is_replacement) True if only one word is changed
                                 cross_replace_steps,
                                 self_replace_steps,
                                 blend_word,
                                 eq_params)

tau = 0.8
image, _ = generation.runner(
    # Playing params
    guidance_scale=19.0,
    tau1=tau,  # Dynamic guidance if tau < 1.0
    tau2=tau,

    # Fixed params
    model=reverse_cons_model,
    is_cons_forward=True,
    w_embed_dim=512,
    solver=solver,
    prompt=prompts,
    controller=controller,
    num_inference_steps=50,
    generator=None,
    latent=ddim_latent,
    uncond_embeddings=uncond_embeddings,
    return_type='image')

generation.to_pil_images(image).save('test_editing_iCD-SD1.5.jpg')
generation.view_images(image)

Note:
Please note that zero-shot editing is highly sensitive to hyperparameters. Thus, we recommend tuning: cross_replace_steps (from 0.0 to 1.0), self_replace_steps (from 0.0 to 1), tau (0.7 or 0.8 seems to work best), guidance scale (up to 19), and amplify factor (eq_params).

You can also consider the similar easy-to-run examples for the SDXL model or move on to in-depth examples

Citation

@article{starodubcev2024invertible,
  title={Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps},
  author={Starodubcev, Nikita and Khoroshikh, Mikhail and Babenko, Artem and Baranchuk, Dmitry},
  journal={arXiv preprint arXiv:2406.14539},
  year={2024}
}

yandex-research / invertible-cd

readme