hutaiHang / ToMe

[NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
https://arxiv.org/abs/2411.07132
33 stars 0 forks source link

🌟 [NeurIPS 2024] Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

πŸ“‘ Introduction

Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis

Taihang Hu, Linxuan Li, Joost van de Weijer, Hongcheng Gao, Fahad Khan, Jian Yang, Ming-Ming Cheng, Kai Wang, Yaxing Wang

πŸ“šarXiv

This paper defines semantic binding as the task of associating an object with its attribute (attribute binding) or linking it to related sub-objects (object binding). We propose a novel method called Token Merging (ToMe), which enhances semantic binding by aggregating relevant tokens into a single composite token, aligning the object, its attributes, and sub-objects in the same cross-attention map.

For technical details, please refer to our paper.

πŸš€ Usage

  1. Environment Setup

    Create and activate the Conda virtual environment:

    conda env create -f environment.yaml
    conda activate tome

    Alternatively, install dependencies via pip:

    pip install -r requirements.txt

    Additionally, download the SpaCy model for syntax parsing:

    python -m spacy download en_core_web_trf
  2. Configure Parameters

    Modify the configs/demo_config.py file to adjust runtime parameters as needed. This file includes two example configuration classes: RunConfig1 for object binding and RunConfig2 for attribute binding. Key parameters are as follows:

    • prompt: Text prompt for guiding image generation.
    • model_path: Path to the Stable Diffusion model; set to None to download the pretrained model automatically.
    • use_nlp: Whether to use an NLP model for token parsing.
    • token_indices: Indices of tokens to merge.
    • prompt_anchor: Split text prompt.
    • prompt_merged: Text prompt after token merging.
    • For further parameter details, please refer to the comments in the configuration file and our paper.
  3. Run the Example

    Execute the main script run_demo.py:

    python run_demo.py

    The generated images will be saved in the demo directory.

πŸ“Έ Example Outputs

If everything is set up correctly, RunConfig1 and RunConfig2 should produce the left and right images below, respectively:

⚠️ Notes

πŸ™ Acknowledgments

This project builds upon valuable work and resources from the following repositories:

We extend our sincere thanks to the creators of these projects for their contributions to the field and for making their code available. πŸ™Œ

BibTeX

@inproceedings{hu2024token,
  title={Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis},
  author={Taihang Hu and Linxuan Li and Joost van de Weijer and Hongcheng Gao and Fahad Khan and Jian Yang and Ming-Ming Cheng and Kai Wang and Yaxing Wang},
  booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
  year={2024},
  url={https://openreview.net/forum?id=tRRWoa9e80}
}