Hi, I've been searching for an explanation for the choice of scaling the elo scores based on one of the models in elo_analysis when calculating the MLE elo ratings, but did not find any. Specifically, I see this line added at the end of compute_elo_mle_with_tie:
if "mixtral-8x7b-instruct-v0.1" in models.index:
elo_scores += 1114 - elo_scores[models["mixtral-8x7b-instruct-v0.1"]]
This effectively anchors the scores to mixtral-8x7b-instruct-v0.1 with a rating of 1114. Any explanation for the choice of the model and the seemingly arbitrary number 1114? How does adding this affect the overall elo ratings and corresponding bootstrap CIs? Is it chosen so that elo ratings can be compared over time? I'm asking because I noticed that the bootstrap CIs become a lot wider with the anchor model and I'm not sure which model to choose and number to set for my own set of models.
Hi, I've been searching for an explanation for the choice of scaling the elo scores based on one of the models in
elo_analysis
when calculating the MLE elo ratings, but did not find any. Specifically, I see this line added at the end ofcompute_elo_mle_with_tie
:This effectively anchors the scores to mixtral-8x7b-instruct-v0.1 with a rating of 1114. Any explanation for the choice of the model and the seemingly arbitrary number 1114? How does adding this affect the overall elo ratings and corresponding bootstrap CIs? Is it chosen so that elo ratings can be compared over time? I'm asking because I noticed that the bootstrap CIs become a lot wider with the anchor model and I'm not sure which model to choose and number to set for my own set of models.
Any help will be appreciated, thanks!