haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.54k stars 2.27k forks source link

[LLaVA bench in the wild] which result to report? #958

Open g-h-chen opened 11 months ago

g-h-chen commented 11 months ago

Hi haotian

key | cand/anchor | anchor | cand
all 81.0 83.7 67.8
llava_bench_complex 88.4 85.0 75.2
llava_bench_conv 77.0 80.6 62.1
llava_bench_detail 71.3 84.7 60.3

Here is the result we obtained for our model by running the eval scripts. But I'm confused which result should I report in the paper?

Thanks in advance!

HenryHZY commented 9 months ago

@haotian-liu I think the answer is the value of ['all']['cand/anchor']?

OliverLeeXZ commented 9 months ago

Same question! Which one to report? There is no 'key | cand/anchor | anchor | cand 'in my output.

rohan598 commented 8 months ago

@haotian-liu I have the same question, can you share which result to present?

@OliverLeeXZ and @g-h-chen were you able to find the correct strategy?

@HenryHZY Could you expand on your response?

HenryHZY commented 8 months ago

@haotian-liu I have the same question, can you share which result to present?

@OliverLeeXZ and @g-h-chen were you able to find the correct strategy?

@HenryHZY Could you expand on your response?

I think the result is the upper left value, that is, the first value.

Under this setting, The result of model-zoo.md matches the average result of the three attached results in eval.zip.

rohan598 commented 8 months ago

@haotian-liu I have the same question, can you share which result to present? @OliverLeeXZ and @g-h-chen were you able to find the correct strategy? @HenryHZY Could you expand on your response?

I think the result is the upper left value, that is, the first value.

Under this setting, The result of model-zoo.md matches the average result of the three attached results in eval.zip.

Yes, I too verified, thank you for this!

ppalantir commented 3 months ago

@haotian-liu I have the same question, can you share which result to present? @OliverLeeXZ and @g-h-chen were you able to find the correct strategy? @HenryHZY Could you expand on your response?

I think the result is the upper left value, that is, the first value. Under this setting, The result of model-zoo.md matches the average result of the three attached results in eval.zip.

Yes, I too verified, thank you for this!

Hi, when I perform "CUDA_VISIBLE_DEVICES=0 bash scripts/v1_5/eval/llavabench.sh"

it report the error "No API key provided. You can set your API key in code using 'openai.api_key = ', or you can set the environment variable OPENAI_API_KEY=)."

Does this review necessarily need to use the OpenAI API? And could you give me some suggestions about cheap api~