jefferyZhan / Griffon

【ECCV2024】The official repo of Griffon series
Apache License 2.0
93 stars 5 forks source link

# Welcome to Griffon

This is the official repo of the Griffon series (v1 & v2). Griffon is the first high-resolution (over 1K) LVLM capable of localizing everything you are interested in describing the region you specify. In the latest version, Griffon supports visual-language co-referring. You can input an image or some descriptions. Griffon achieves excellent performance in REC, object detection, object counting, visual/phrase grounding, and REG.


Griffon: Spelling out All Object Locations at Any Granuality with Large Language Model

📕Paper 🌀Usage 🤗Model

Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring

📕Paper

Griffon-G with More General, More Tasks, and Better Performance!

Coming in a few days!

News

What can Griffon do now?

Griffon v2 can perform localization with free-form text inputs and visual target inputs with locally cropped images now, supporting the tasks shown below. More quantitative evaluation results can be found in our paper.

Acknowledgement

Citation

If you find Griffon useful for your research and applications, please cite using this BibTeX:

@misc{zhan2023griffon,
      title={Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models}, 
      author={Yufei Zhan and Yousong Zhu and Zhiyang Chen and Fan Yang and Ming Tang and Jinqiao Wang},
      year={2023},
      eprint={2311.14552},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{zhan2024griffon,
      title={Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring}, 
      author={Yufei Zhan and Yousong Zhu and Hongyin Zhao and Fan Yang and Ming Tang and Jinqiao Wang},
      year={2024},
      eprint={2403.09333},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

License

Code License Data License

The data and checkpoint is licensed for research use only. All of them are also restricted to uses that follow the license agreement of LLaVA, LLaMA and GPT-4. The dataset is CC BY NC 4.0 (allowing only non-commercial use) and models trained using the dataset should not be used outside of research purposes.