This repository is the PyTorch implementation of the paper:
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models (CVPR 2024)
Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
The following code is based on the Stable-diffusion-repository.
The PH2P code modifies the following modules:
Clip_transformer
to localclip_transformer
. LocalCustomTokenEmbedding
with the projection algorithm.main_textual_inversion.py
.models/ldm/stable-diffusion-v1/model.ckpt
For running the prompt inversion specify image path in inversion_config.json
python main_textual_inversion.py
The prompts will be saved in ./logs_forward_pass/
.
The best prompt for a given image is obtained from the maximum clip similarity between the target image and the generated image for a prompt.
This additionally requires transformers 4.25.1
and diffusers 0.12.1
python get_best_text.py
@inproceedings{ph2p2024cvpr,
title = {Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models},
author = {Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal},
booktitle = {CVPR 2024 (To appear)},
year = {2024}
}