huggingface / alignment-handbook

Robust recipes to align language models with human and AI preferences
https://huggingface.co/HuggingFaceH4
Apache License 2.0
4.54k stars 393 forks source link

Question about AI Feedback (AIF) #90

Open HaoruSung opened 9 months ago

HaoruSung commented 9 months ago

In the AI Feedback (AIF) phase, with GPT-4 serving as the teacher model,I am curious to know if there might be a propensity for GPT-4 to assign higher ratings to its own outputs?

Additionally, I am interested in the statistical distribution of various large language models chosen as ${y_w}$ during the AI Feedback (AIF) evaluation in your study. Have you conducted an analysis on how frequently different LLMs were selected for this purpose?

Thank you!