vlf-silkie / VLFeedback

66 stars 2 forks source link

Impact of Including GPT-4V in LVLM Pool? #1

Open Etelis opened 6 months ago

Etelis commented 6 months ago

First and foremost, thank you for writing this paper; it was very intriguing and informative. I have a question that arose during my reading.

What are the conceptual benefits when the supervisor model (GPT-4V) is included in the LVLM pool? Wouldn't this approach inherently bias the outcomes towards the decisions of GPT-4V? If so, how does the ensemble benefit in this scenario?

TobiasLee commented 6 months ago

Thank you for engaging with our paper. We appreciate your thoughtful question and the opportunity to clarify the inclusion of the GPT-4V model in our study.

GPT-4V is integrated into our LVLM pool due to its status as a representative commercial LVLM that is readily accessible. As highlighted in the preliminary study on GPT-4V (refer to link), it stands out as one of the most powerful LVLMs currently available. Importantly, its performance serves as a benchmark, forming the foundation for its role as the annotator in our ensemble.

Concerning the potential bias towards GPT-4V outcomes, particularly in annotated ratings, we acknowledge the possibility of unreliability and bias associated with GPT-4V annotations. To address this, we conducted a correlation analysis (refer to Paragraph 3 in Sec 2.4) comparing human annotators to GPT-4V. Impressively, this analysis revealed an average agreement rate of 83.1%, demonstrating a substantial alignment between human and GPT-4V annotations.

Moreover, in experiments involving DPO, we implemented a GPT-4V always as the best strategy, where GPT-4V responses were consistently chosen as the 'best' in DPO pairs. Notably, this simple heuristic outperformed the original backbone model significantly. This outcome suggests that biasing decisions towards GPT-4V does not guarantee a one-size-fits-all solution for performance improvement, emphasizing the nuanced nature of model ensemble dynamics.

We hope this provides clarity on the conceptual benefits of incorporating GPT-4V into our LVLM pool and how potential biases are addressed and validated in our study. If you have any further questions or require additional information, please feel free to ask.