rongyaofang / PUMA

Empowering Unified MLLM with Multi-granular Visual Generation
Apache License 2.0
105 stars 1 forks source link

PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation

Home visitors [Rongyao Fang](https://scholar.google.com/citations?user=FtH3CW4AAAAJ&hl=en)1\*, [Chengqi Duan](https://scholar.google.com/citations?user=r9qb4ZwAAAAJ&hl=zh-CN)2\*, [Kun Wang]()3, [Hao Li](https://scholar.google.com/citations?user=qHqQsY4AAAAJ&hl=zh-CN)1,4, [Hao Tian]()3, [Xingyu Zeng]()3, [Rui Zhao]()3, [Jifeng Dai](https://jifengdai.org/)4,5, [Hongsheng Li](https://www.ee.cuhk.edu.hk/~hsli/)1 :envelope:, [Xihui Liu](https://xh-liu.github.io/)2 :envelope: 1CUHK MMLab, 2HKU MMLab, 3SenseTime, 4Shanghai AI Laboratory, 5Tsinghua University *Equal contribution, :envelope:Corresponding authors

:fire: We will release the code and models soon!

:book: Table of Contents

:new:Update

:hourglass: TODO

Abstract

PUMA introduces a unified multimodal large language model framework designed to integrate multi-granular visual generation and understanding. Our model excels in a variety of visual tasks, including diverse text-to-image generation, precise image editing, conditional image generation, and visual understanding. It strikes a balance between generation diversity and controllability, making it a versatile tool for visual tasks.

Read the full paper here.

Framework

Multi-granular Semantic Visual Decoding

Diverse Text-to-image Generation

Image Editing

Image Conditional Generation

Citation

If you find PUMA useful in your research, please consider citing us:

@article{fang2024puma,
  title     ={PUMA: Empowering Unified MLLM with Multi-Granular Visual Generation},
  author    ={Rongyao Fang, Chengqi Duan, Kun Wang, Hao Li, Hao Tian, Xingyu Zeng, Rui Zhao, Jifeng Dai, Hongsheng Li, Xihui Liu},
  journal   ={arxiv},
  year      ={2024}
}

License

This project is released under the Apache 2.0 license.

Contact

If you have any questions, please feel free to contact rongyaofang@gmail.com.
Rongyao Fang anticipates graduating in 2025 and is open to both academic and industrial research positions. If you are interested, please feel free to contact.