Comparing with LLaVA 1.6 Next - Githubissues

thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

303 stars 15 forks source link

Comparing with LLaVA 1.6 Next #1

Open choyakawa opened 6 months ago

choyakawa commented 6 months ago

LLaVA 1.6 Next: https://llava-vl.github.io/blog/2024-01-30-llava-next/ some benchmark results of 13B ver. are also available.

choyakawa commented 6 months ago

LLM analysis from Gemini 1.5 pro:

Feature	LLaVA-UHD-13B	LLaVA-NeXT-7B	LLaVA-NeXT-13B	LLaVA-NeXT-34B	LLaVA 1.5-13B
VQAv2	81.7	81.8 (Vicuna) / 82.2 (Mistral)	82.8	*83.7*	80
GQA	65.2	64.2 (Vicuna) / 64.8 (Mistral)	65.4	*67.1*	63.3
TextVQA	67.7	64.9 (Vicuna) / 65.7 (Mistral)	67.1	*69.5*	61.3
ScienceQA	72	70.1 (Vicuna) / 72.8 (Mistral)	73.6	*81.8*	71.6
VizWiz	56.1	57.6 (Vicuna) / 60.0 (Mistral)	60.5	*63.8*	53.6
MMU (val)	36.4	35.8 (Vicuna) / 35.3 (Mistral)	36.2	*51.1*	36.4
MMU (test)	33.6	-	-	*44.7*	33.6
MME	1535	1519 (Vicuna) / 1498 (Mistral)	1575	*1631*	1531
POPE	*89.1*	86.5 (Vicuna) / 86.7 (Mistral)	86.2	87.7	85.9

Observations:

LLaVA-UHD generally performs better than LLaVA 1.5 across all metrics.
LLaVA-NeXT series shows comparable performance to LLaVA-UHD on most tasks, with slight variations depending on the specific model (Vicuna or Mistral).
LLaVA-NeXT-34B stands out with significantly higher performance on ScienceQA and MMU tasks.