Experimental results on the MME dataset

sangminwoo / AvisC

Official pytorch implementation of "Don't Miss the Forest for the Trees: Attentional Vision Calibration for Large Vision Language Models"

MIT License

8 stars 0 forks source link

Hi @LEE-IU, Thanks for your interest in our work!

To clarify, the “base” in our comparison refers to the experimental results obtained on the MME dataset using the LLaVA-v1.5-7b model, as implemented within our codebase. The discrepancy you observed between our results and those reported in the original LLaVA paper is likely due to differences in several factors, including the codebase, hyperparameters (such as the random seed and the alpha value used for decoding), and experimental setups. Additionally, the MME dataset contains only a few hundred images, making the total scores on this benchmark highly sensitive to small variations. To address this variability, we report averaged results across multiple runs, each using a different randomly selected seed.

We hope this explanation resolves your confusion!

sangminwoo / AvisC

Experimental results on the MME dataset #7