haozheji / exact-optimization

ICML 2024 - Official Repository for EXO: Towards Efficient Exact Optimization of Language Model Alignment
https://arxiv.org/abs/2402.00856
MIT License
45 stars 0 forks source link

Calculating reward model win rates to reproduce experiments #5

Closed sdsfas12 closed 2 months ago

sdsfas12 commented 2 months ago

Hi there! Your work is exciting and inspiring. Thanks a lot!

I'm currently trying to reproduce the experiments, but the reproduced reward model win rates are much lower than those in the paper. I wonder if I'm using a wrong reward model for inference, or calculating win rates in a wrong way. Could you share more information about the reward model for inference and the specific process to calculate win rates (e.g., counting wins/loses for each comparison, or aggregating the wins/loses of all comparisons of each sample and counting wins/loses of samples)?

haozheji commented 2 months ago

Hi! Thanks for your interest in our work!

We are uploading the reward model checkpoints to https://huggingface.co/collections/ehzoah/efficient-exact-optimization-667995e5a7f87dff7d01a85a Running the provided training script should obtain the reward model with similar performance.

The final win rate is caluclated by averaging the win rate of each example. The core logic is something like this:

# example:
# policy_scores = [[1.0,0.8]]
# ref_scores = [[0.9, 0.6]]

wins = []
for ps, rs in zip(policy_scores, ref_scores):
    samp_wins = []
    for p in ps:
        for r in rs:
            samp_wins.append(p > r)

    wins.append(sum(samp_wins) / len(samp_wins))

win_perc = sum(wins) / len(wins)
sdsfas12 commented 2 months ago

Okay, thanks for your kind answer! I'll try your checkpoints and win rate calculation code.