Ovis1.5-Llama3-8B在Hallusion Bench上的指标和榜单上的指标差距过大

open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks

Apache License 2.0

1.4k stars 195 forks source link

Hi, @LIRENDA621 , I have re-evaluated this model (torch2.4+cu121, transformers==4.46.2), and got an accuracy of ~42.3%, which looks inferior to previous evaluation results. However, we are not sure whether it's due to randomness.

We will re-evaluate this model soon to see if all evaluation results are significant different. If so, we will update the leaderboard and OpenVLMRecords. You can also find the prediction files corresponding to the 45% average accuracy in https://huggingface.co/datasets/VLMEval/OpenVLMRecords and check if there is some problems.

open-compass / VLMEvalKit

Ovis1.5-Llama3-8B在Hallusion Bench上的指标和榜单上的指标差距过大 #595