Open LEE-IU opened 6 days ago
Hi @LEE-IU, Thanks for your interest in our work!
To clarify, the “base” in our comparison refers to the experimental results obtained on the MME dataset using the LLaVA-v1.5-7b model, as implemented within our codebase. The discrepancy you observed between our results and those reported in the original LLaVA paper is likely due to differences in several factors, including the codebase, hyperparameters (such as the random seed and the alpha value used for decoding), and experimental setups. Additionally, the MME dataset contains only a few hundred images, making the total scores on this benchmark highly sensitive to small variations. To address this variability, we report averaged results across multiple runs, each using a different randomly selected seed.
We hope this explanation resolves your confusion!
Dear authors,I noticed that you compared the AvisC method with the base model in your paper. I would like to ask whether the base refers to the experimental results on the MME dataset using the LLaVA-v1.5-7b model. This is because the result seems significantly lower than the one reported in the LLaVA paper, and I would like to understand the reason for this discrepancy. If the base does not correspond to the original results reported in the LLaVA paper, could you please clarify what the base refers to? I sincerely hope you can help resolve my confusion. I look forward to your reply and thank you in advance.