The accuracy of reward model seem to be low

llava-rlhf / LLaVA-RLHF

Aligning LMMs with Factually Augmented RLHF

https://llava-rlhf.github.io/

GNU General Public License v3.0

301 stars 19 forks source link

The accuracy of reward model seem to be low #24

Closed Wizardcoast closed 5 months ago

Wizardcoast commented 5 months ago

Thanks for sharing you excellent research. I'm training a full-finetuned reward model (without QLora) from "LLaVA-RLHF-13b-v1.5-336/sft_model" with LLaVA-Human-Preference-10K, and find the eval accuracy is around 63%-67%. This seems to be under expected as on NLP datasets the reward accuracy may around 75%. Is this performance ehough for the RLHF pipeline or any intuition to revise this？

Edward-Sun commented 5 months ago

Hi, this is enough. According to alpacafarm and our internal study, the held-out human agreement rate is typically also around 65 - 70%.