Reward Model based Evaluations

huggingface / evaluation-guidebook

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

Other

583 stars 35 forks source link

Reward Model based Evaluations #17

Closed sanderland closed 2 days ago

sanderland commented 6 days ago

This adds a section on reward model based evaluations.

I've used these a lot, but not seen a lot written about them, possibly due to high-performing open reward models being a fairly recent development. They also have a number of pitfalls that I've seen again and again (like the absolute scores being somewhat meaningless).

This probably needs a bit of work on references and backlinking, but thought I'd start here and we can iterate if you think this would be an interesting addition.

clefourrier commented 6 days ago

Hi! Yes, sounds super cool, I have not found a lot of literature on those!

I would organize it in the following way.

# Reward Model Based Evaluation
## What is a reward-model?
includes your intro, the bradley terry models, and the Other types of reward models section - since they are short I'm not sure they need sub-headers.
I'd love to see a bit more references there, notably if you know of cool blogs which could explain the concepts easily

## How do I use a reward-model for evaluation?

## Pros and cons of reward models

## Tips and tricks of using reward models for evaluation

sanderland commented 6 days ago

Have used your proposed setup and linked some example models, but trying to avoid the focus shifting to purely reward models themselves rather than evaluations. Don't know a lot of cool blogs about this but if I tag @natolambert he might come along and link to relevant things.