paperswithlove / papers-we-read

3 stars 0 forks source link

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs #37

Open runhani opened 2 weeks ago

runhani commented 2 weeks ago

Paper : https://arxiv.org/pdf/2406.16860 Website : https://cambrian-mllm.github.io Code : https://github.com/cambrian-mllm/cambrian Models : https://huggingface.co/nyu-visionx/ Data : https://huggingface.co/datasets/nyu-visionx/Cambrian-10M CV-Bench : https://huggingface.co/datasets/nyu-visionx/CV-Bench Evaluation : https://github.com/cambrian-mllm/cambrian

Open cookbook for instruction-tuned MLLMs

Intro

Data

Becnmark

LLMs

Vision

Connector

image

runhani commented 2 weeks ago

MLLM의 중요한 5가지 key components

LLM

Visual Encoder

Multimodal Connector

Data Curation Pipeline

Instruction Tuning Strategy

runhani commented 2 weeks ago

누가 답변하는거야? LLM vs MLLM

image

runhani commented 2 weeks ago

becnmark를 clustering해 봅시다.

image