Closed ppalantir closed 1 month ago
The overall accuracy reported in LLaVA-1.5 is based on the MMBench_CN dataset, aggregated from the TEST set and the DEV set. In my already established environment, I tested the overall accuracy of MMBench_CN to match the official one and ested the accuracy of MMBench_DEV_CN as shown below. The fact that you measured low is probably related to the environment. The part of the environment I built is as follows, you can refer to the upgrade and re-test.
The overall accuracy reported in LLaVA-1.5 is based on the MMBench_CN dataset, aggregated from the TEST set and the DEV set. In my already established environment, I tested the overall accuracy of MMBench_CN to match the official one and ested the accuracy of MMBench_DEV_CN as shown below. The fact that you measured low is probably related to the environment. The part of the environment I built is as follows, you can refer to the upgrade and re-test.
@FangXinyu-0913 Thank you very much for your reply. But your overall result of 54.72 is also lower than reported result (MMBench-cn 63.6) with a large margin. Do you have any idea about possible reasons?
Hi, @ppalantir , We reproduce the problem and also find that the previous results of llava_v1.5 on some Chinese benchmarks (MMBench-CN, CCBench, etc.) can not be reproduced now. We have asked the authors of LLaVA but still cannot figure out the reason, so we have updated the leaderboard to align with the current evaluation results for now.
Thank you for your awesome work!
I followed the README.md and used command
python run.py --data MMBench_DEV_CN --model llava_v1.5_13b --verbose
, and found the overall accuracy is much lower than LLaVA-1.5's reported result (MMBench-cn 63.6)