Open choyakawa opened 6 months ago
LLM analysis from Gemini 1.5 pro:
Feature | LLaVA-UHD-13B | LLaVA-NeXT-7B | LLaVA-NeXT-13B | LLaVA-NeXT-34B | LLaVA 1.5-13B |
---|---|---|---|---|---|
VQAv2 | 81.7 | 81.8 (Vicuna) / 82.2 (Mistral) | 82.8 | 83.7 | 80 |
GQA | 65.2 | 64.2 (Vicuna) / 64.8 (Mistral) | 65.4 | 67.1 | 63.3 |
TextVQA | 67.7 | 64.9 (Vicuna) / 65.7 (Mistral) | 67.1 | 69.5 | 61.3 |
ScienceQA | 72 | 70.1 (Vicuna) / 72.8 (Mistral) | 73.6 | 81.8 | 71.6 |
VizWiz | 56.1 | 57.6 (Vicuna) / 60.0 (Mistral) | 60.5 | 63.8 | 53.6 |
MMU (val) | 36.4 | 35.8 (Vicuna) / 35.3 (Mistral) | 36.2 | 51.1 | 36.4 |
MMU (test) | 33.6 | - | - | 44.7 | 33.6 |
MME | 1535 | 1519 (Vicuna) / 1498 (Mistral) | 1575 | 1631 | 1531 |
POPE | 89.1 | 86.5 (Vicuna) / 86.7 (Mistral) | 86.2 | 87.7 | 85.9 |
Observations:
LLaVA 1.6 Next: https://llava-vl.github.io/blog/2024-01-30-llava-next/ some benchmark results of 13B ver. are also available.