Pentanomial elo over WDL elo

gahtan-syarif commented 6 months ago

since youre already possessing pentanomial information of the match results why not use the more accurate pentanomial elo to replace the current WDL elo? the formula to calculate it is pretty similar to the 3dkingdoms elo formula that youre already using with some minor adjustments. this is the python version of the pentanomial elo calculator:

import numpy as np
from scipy.stats import norm

def score_to_elo(score):
    return -400 * np.log(1 / score - 1) / np.log(10)

def elo_to_score(elo):
    return (1 / (1 + 10 ** (-elo / 400)))

Pentanomial = [1, 3, 4, 7, 2]
double_wins = Pentanomial[4]
single_wins = Pentanomial[3]
double_draws = Pentanomial[2]
single_loss = Pentanomial[1]
double_loss = Pentanomial[0]

total_pairs = double_wins + single_wins + double_draws + single_loss + double_loss
score = (double_wins + single_wins * 0.75 + double_draws * 0.5 + single_loss * 0.25 + double_loss * 0) / total_pairs

double_wins_p = double_wins / total_pairs
single_wins_p = single_wins / total_pairs
double_draws_p = double_draws / total_pairs
single_loss_p = single_loss / total_pairs
double_loss_p = double_loss / total_pairs

double_wins_dev = double_wins_p * (1 - score) ** 2
single_wins_dev = single_wins_p * (0.75 - score) ** 2
double_draws_dev = double_draws_p * (0.5 - score) ** 2
single_loss_dev = single_loss_p * (0.25 - score) ** 2
double_loss_dev = double_loss_p * (0 - score) ** 2
std_deviation = np.sqrt((double_wins_dev + single_wins_dev + double_draws_dev + single_loss_dev + double_loss_dev) / total_pairs)

confidence_p = 0.95
min_confidence_p = (1 - confidence_p) / 2
max_confidence_p = 1 - min_confidence_p
score_lower_bound = score + norm.ppf(min_confidence_p) * std_deviation
score_upper_bound = score + norm.ppf(max_confidence_p) * std_deviation

elo = score_to_elo(score)
elo_upper_bound = score_to_elo(score_upper_bound)
elo_lower_bound = score_to_elo(score_lower_bound)
error_margin = (elo_upper_bound - elo_lower_bound) / 2

gahtan-syarif commented 6 months ago

another suggestion id like to make is to include likelihood of superiority (LOS), so the percentage likelihood that one engine is better than the other. In python it can be written as this:

elo_stdev = error_margin / norm.ppf(max_confidence_p)
LOS = norm(elo, elo_stdev).sf(0)

in javascript the norm function i believe can be replaced with

const LOS = math.erfc((-elo) / (Math.sqrt(2) * elo_stdev)) / 2;

Edit: for the LOS in javascript i might have to double check since im not too familiar with the language, but ive already tested the python version Edit 2: ive now confirmed that both formulas written above are functionally equivalent

truekendor commented 6 months ago

Thanks for the suggestion, @gahtan-syarif!

I'll probably add what you suggested.

Can you provide the repository/source from where you got the code so I can double check it and acknowledge the authors?

Many thanks

gahtan-syarif commented 6 months ago

Thanks for the suggestion, @gahtan-syarif!

I'll probably add what you suggested.

Can you provide the repository/source from where you got the code so I can double check it and acknowledge the authors?

Many thanks

You're welcome! the original code is from mathematician Michel Van den Bergh who is resposnsible for the mathematics in stockfish testing framework fishtest. the code for the pentanomial elo can be found in this link here https://github.com/official-stockfish/fishtest/blob/master/server/fishtest/stats/stat_util.py specifically the functions def stats and def get_elo. its written in a more condensed format there but the logic and the results should be the same as the version that i gave you which is closer to the 3dkingdoms way of writing it for easier readability

gahtan-syarif commented 6 months ago

here is also another way of writing the code written by OpenBench creator Andrew Grant that he recently posted in the Stockfish Discord:

import math
from scipy.stats import norm

def elo(x):
    return -400 * math.log10(1 / x - 1);

R = [275, 4660, 8569, 4040, 174]
N = sum(R)

mu  = sum((f / 4.0) * R[f] for f in range(len(R))) / N
var = sum(((f / 4.0) - mu)**2 * R[f] for f in range(len(R))) / N

mu_min = mu + norm.ppf(0.025) * math.sqrt(var) / math.sqrt(N)
mu_max = mu + norm.ppf(0.975) * math.sqrt(var) / math.sqrt(N)

print (elo(mu_min), elo(mu), elo(mu_max))

truekendor commented 6 months ago

@gahtan-syarif Many thanks, I'll check it later

truekendor / better-ccc-extension

Pentanomial elo over WDL elo #19