lavinal712 / Transfusion

MIT License
2 stars 0 forks source link

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Unofficial PyTorch Implementation

Paper

Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
Meta, Waymo, University of Southern California

Setup

requirements.txt will be released later.

conda create -n transfusion python=3.10
conda activate transfusion
pip install -r requirements.txt

Sampling

python sample.py --model_name /path/to/pretrained_model

Training

accelerate launch --mixed_precision fp16 train.py --data_path /path/to/ImageNet/train

Acknowledgments

The code is highly inspired by the following repositories:

There are some repositories do the same work:

Citation

@misc{zhou2024transfusionpredicttokendiffuse,
      title={Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model}, 
      author={Chunting Zhou and Lili Yu and Arun Babu and Kushal Tirumala and Michihiro Yasunaga and Leonid Shamis and Jacob Kahn and Xuezhe Ma and Luke Zettlemoyer and Omer Levy},
      year={2024},
      eprint={2408.11039},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2408.11039}, 
}