haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
18.06k stars 1.96k forks source link

[Question] I finetuning llava1.5-7b model using lora on llava_mix_665k dataset but get 545 on MME perception score. Has anyone else encountered this issue? #1461

Open OliverLeeXZ opened 2 months ago

OliverLeeXZ commented 2 months ago

Question

I finetuning llava1.5-7b model using lora on llava_mix_665k dataset on 4*A100-40g. However, model has bad performance on MME benchmark. perception score:545, cognition score 197. Has anyone else encountered this issue?

OliverLeeXZ commented 2 months ago

@haotian-liu My training hyperparameters remain consistent with you provided. Here are my partial train logs and MME results: 100%|██████████| 10396/10396 [22:39:17<00:00, 9.15s/it]
{'train_runtime': 81561.6745, 'train_samples_per_second': 8.157, 'train_steps_per_second': 0.127, 'train_loss': 3.424769993772549, 'epoch': 1.0}

=========== Perception =========== total score: 545.515306122449

 existence  score: 48.333333333333336
 count  score: 50.0
 position  score: 48.333333333333336
 color  score: 55.00000000000001
 posters  score: 46.59863945578232
 celebrity  score: 55.0
 scene  score: 55.0
 landmark  score: 68.25
 artwork  score: 71.5
 OCR  score: 47.5

=========== Cognition =========== total score: 197.5

 commonsense_reasoning  score: 45.0
 numerical_calculation  score: 57.5
 text_translation  score: 75.0
 code_reasoning  score: 20.0
zjysteven commented 2 weeks ago

@OliverLeeXZ Hi just curious have you found out the reason?