Open gahtan-syarif opened 6 months ago
another suggestion id like to make is to include likelihood of superiority (LOS), so the percentage likelihood that one engine is better than the other. In python it can be written as this:
elo_stdev = error_margin / norm.ppf(max_confidence_p)
LOS = norm(elo, elo_stdev).sf(0)
in javascript the norm function i believe can be replaced with
const LOS = math.erfc((-elo) / (Math.sqrt(2) * elo_stdev)) / 2;
Edit: for the LOS in javascript i might have to double check since im not too familiar with the language, but ive already tested the python version Edit 2: ive now confirmed that both formulas written above are functionally equivalent
Thanks for the suggestion, @gahtan-syarif!
I'll probably add what you suggested.
Can you provide the repository/source from where you got the code so I can double check it and acknowledge the authors?
Many thanks
Thanks for the suggestion, @gahtan-syarif!
I'll probably add what you suggested.
Can you provide the repository/source from where you got the code so I can double check it and acknowledge the authors?
Many thanks
You're welcome! the original code is from mathematician Michel Van den Bergh who is resposnsible for the mathematics in stockfish testing framework fishtest. the code for the pentanomial elo can be found in this link here https://github.com/official-stockfish/fishtest/blob/master/server/fishtest/stats/stat_util.py specifically the functions def stats and def get_elo. its written in a more condensed format there but the logic and the results should be the same as the version that i gave you which is closer to the 3dkingdoms way of writing it for easier readability
here is also another way of writing the code written by OpenBench creator Andrew Grant that he recently posted in the Stockfish Discord:
import math
from scipy.stats import norm
def elo(x):
return -400 * math.log10(1 / x - 1);
R = [275, 4660, 8569, 4040, 174]
N = sum(R)
mu = sum((f / 4.0) * R[f] for f in range(len(R))) / N
var = sum(((f / 4.0) - mu)**2 * R[f] for f in range(len(R))) / N
mu_min = mu + norm.ppf(0.025) * math.sqrt(var) / math.sqrt(N)
mu_max = mu + norm.ppf(0.975) * math.sqrt(var) / math.sqrt(N)
print (elo(mu_min), elo(mu), elo(mu_max))
@gahtan-syarif Many thanks, I'll check it later
since youre already possessing pentanomial information of the match results why not use the more accurate pentanomial elo to replace the current WDL elo? the formula to calculate it is pretty similar to the 3dkingdoms elo formula that youre already using with some minor adjustments. this is the python version of the pentanomial elo calculator: