Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
Taihang Hu, Linxuan Li, Joost van de Weijer, Hongcheng Gao, Fahad Khan, Jian Yang, Ming-Ming Cheng, Kai Wang, Yaxing Wang
πarXiv
This paper defines semantic binding as the task of associating an object with its attribute (attribute binding) or linking it to related sub-objects (object binding). We propose a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a single composite token, aligning the object, its attributes, and sub-objects in the same cross-attention map.
For technical details, please refer to our paper.
Environment Setup
Create and activate the Conda virtual environment:
conda env create -f environment.yaml
conda activate tome
Alternatively, install dependencies via pip
:
pip install -r requirements.txt
Additionally, download the SpaCy model for syntax parsing:
python -m spacy download en_core_web_trf
Configure Parameters
Modify the configs/demo_config.py
file to adjust runtime parameters as needed. This file includes two example configuration classes: RunConfig1
for object binding and RunConfig2
for attribute binding. Key parameters are as follows:
prompt
: Text prompt for guiding image generation.model_path
: Path to the Stable Diffusion model; set to None
to download the pretrained model automatically.use_nlp
: Whether to use an NLP model for token parsing.token_indices
: Indices of tokens to merge.prompt_anchor
: Split text prompt.prompt_merged
: Text prompt after token merging.Run the Example
Execute the main script run_demo.py
:
python run_demo.py
The generated images will be saved in the demo
directory.
If everything is set up correctly, RunConfig1
and RunConfig2
should produce the left and right images below, respectively:
configs/demo_config.py
and make necessary adjustments in run_demo.py
.This project builds upon valuable work and resources from the following repositories:
We extend our sincere thanks to the creators of these projects for their contributions to the field and for making their code available. π
@inproceedings{hu2024token,
title={Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis},
author={Taihang Hu and Linxuan Li and Joost van de Weijer and Hongcheng Gao and Fahad Khan and Jian Yang and Ming-Ming Cheng and Kai Wang and Yaxing Wang},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=tRRWoa9e80}
}