Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Chunting Zhou, Lili Yu, Arun Babu, Kushal Tirumala, Michihiro Yasunaga, Leonid Shamis, Jacob Kahn, Xuezhe Ma, Luke Zettlemoyer, Omer Levy
Meta, Waymo, University of Southern California
requirements.txt
will be released later.
conda create -n transfusion python=3.10
conda activate transfusion
pip install -r requirements.txt
python sample.py --model_name /path/to/pretrained_model
accelerate launch --mixed_precision fp16 train.py --data_path /path/to/ImageNet/train
The code is highly inspired by the following repositories:
There are some repositories do the same work:
@misc{zhou2024transfusionpredicttokendiffuse,
title={Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model},
author={Chunting Zhou and Lili Yu and Arun Babu and Kushal Tirumala and Michihiro Yasunaga and Leonid Shamis and Jacob Kahn and Xuezhe Ma and Luke Zettlemoyer and Omer Levy},
year={2024},
eprint={2408.11039},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2408.11039},
}