official-stockfish / fishtest

The Stockfish testing framework
https://tests.stockfishchess.org/tests
270 stars 126 forks source link

Elo normalization #1750

Open windfishballad opened 11 months ago

windfishballad commented 11 months ago

LLR calculation is not in native normalized Elo (first normalized bounds are converted to logistic bounds, then log-likelihoods are computed as posteriors to uniform conditional to expected value matching logistic elo bound).

Native posteriors conditional to normalized elo = bound would be a much harder calc.

But consequence is LLR is not a proper random walk (each new pentanomial coming re-states the log-likelihood of previous observations since it changes the bounds) so theory behind SPTR bounds breaks.

Recognize is extremely unlikely to make difference on any given test - but if doable since LLR is a logistic elo not normalized elo likelihood, would be cleaner to normalize using exogeneous data rather than current test if it's easy to do (trailing N fishtest games at this TC and threads for example).

shermansiu commented 5 months ago

Are you able to contribute a PR for this?