thunlp / LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images
303 stars 15 forks source link

Comparing with LLaVA 1.6 Next #1

Open choyakawa opened 6 months ago

choyakawa commented 6 months ago

LLaVA 1.6 Next: https://llava-vl.github.io/blog/2024-01-30-llava-next/ some benchmark results of 13B ver. are also available.

choyakawa commented 6 months ago

LLM analysis from Gemini 1.5 pro:

Feature LLaVA-UHD-13B LLaVA-NeXT-7B LLaVA-NeXT-13B LLaVA-NeXT-34B LLaVA 1.5-13B
VQAv2 81.7 81.8 (Vicuna) / 82.2 (Mistral) 82.8 83.7 80
GQA 65.2 64.2 (Vicuna) / 64.8 (Mistral) 65.4 67.1 63.3
TextVQA 67.7 64.9 (Vicuna) / 65.7 (Mistral) 67.1 69.5 61.3
ScienceQA 72 70.1 (Vicuna) / 72.8 (Mistral) 73.6 81.8 71.6
VizWiz 56.1 57.6 (Vicuna) / 60.0 (Mistral) 60.5 63.8 53.6
MMU (val) 36.4 35.8 (Vicuna) / 35.3 (Mistral) 36.2 51.1 36.4
MMU (test) 33.6 - - 44.7 33.6
MME 1535 1519 (Vicuna) / 1498 (Mistral) 1575 1631 1531
POPE 89.1 86.5 (Vicuna) / 86.7 (Mistral) 86.2 87.7 85.9

Observations: